Analytic Technical Assistance and Development 


Statistical Theory for the 
RCT-YES Software: 
Design-Based Causal 
Inference for RCTs 


Peter Z. Schochet 
Mathematica Policy Research, Inc. 

May 2015 


This report discusses the statistical theory underlying 
the RCT-YES software, which is being developed to 
estimate and report the effects of interventions and 
policies for a wide range of experimental designs. 

As of May 2015, the free software is not yet 
publically available. Once released, Information on 
how to download the software, including the 
RCT-YES User’s Manual, will be posted on the 
IBS website for this report. 


:*ies 


NATIONAL CENTER for 
EDUCATION EVALUATION 
AND REGIONAL ASSISTANCE 


Institute of Education Sciences 
U.S Department of Education 



NCEE 2015-4011 

The National Center for Education Evaluation and Regional Assistance (NCEE) conducts unbiased, 
large-scale evaluations of education programs and practices supported by federal funds; provides 
research-based technical assistance to educators and policymakers; and supports the synthesis and the 
widespread dissemination of the results of research and evaluation throughout the United States. 

May 2015 

This report was prepared for the Institute of Education Sciences (lES) by Decision Information 
Resources, Inc. under Contract ED-1ES-12-C-0057, Analytic Technical Assistance and Development. 
The content of the publication does not necessarily reflect the views or policies of lES or the U.S. 
Department of Education nor does mention of trade names, commercial products, or organizations 
imply endorsement by the U.S. Government. 

This report is in the public domain. While permission to reprint this publication is not necessary, it 
should be cited as: 

Schochet, P. Z. (2015). Statistical theory for the RCT-YES software: Design-based causal inference for RCTs 
(NCEE 2015-4011). Washington, DC: U.S. Department of Education, Institute of Education 
Sciences, National Center for Education Evaluation and Regional Assistance, Analytic Technical 
Assistance and Development. Retrieved from http://ies.ed.gov/ ncee/ edlabs. 


Acknowledgments 


The author is very grateful for the helpful comments provided by Dr. Don Rubin at Harvard 
University and Dr. Guido Imbens at Stanford University who are members of the Technical 
Work Group for the project, and three anonymous referees who reviewed the document as 
part of the peer review process for the Standards and Review Office at the U.S. Department 
of Education (ED). All errors are my own. 



Contents 


Acknowledgments Hi 

Purpose of the report 1 

1. Overview of the considered RCT designs and methodoiogicai topics 3 

2. /?Cr-K£Sdata requirements and input specifications 7 

3. Minimizing disciosure of personaiiy identifiabie information 13 

4. Overview of design-based methods used in RCT-YES 15 

a. Original Neyman finite-population (FP) modei 15 

b. Extending the Neyman-Rubin-Hoiiand model to ciustered designs 17 

c. Extending the Neyman-Rubin-Holland model to blocked designs 18 

d. The super-popuiation modei 20 

e. Reasons for adopting design-based rather than model-based methods 23 

f. The use of robust standard errors as an alternative 24 

g. Summary of simulation analysis findings 25 

h. Design assumptions 26 

I. Brief summary of the considered estimators 27 

5. Design 1: The non-ciustered, non-biocked design 29 

a. Finite-population (FP) model without baseline covariates 29 

b. Super-population (SP) model without baseline covariates 33 

c. Hypothesis testing 36 

d. Multiple comparisons adjustments 37 

e. FP and SP models with baseline covariates 38 

Multiple regression estimator for the FP model 39 

Multiple regression estimator for the SP model 41 

f. Subgroup analysis 42 

Subgroup FP and SP models without baseline covariates 43 

Testing for ATE differences across subgroups 45 

Subgroup FP and SP models with baseline covariates 46 

g. Assessing baseline equivalence 48 

h. Treatment of missing outcome data and the use of nonresponse weights 49 


v 


Case deletion 51 

Using nonresponse weights 52 

i. Treatment of missing covariate and subgroup data 57 

j. Identification of problem covariates 58 

k. Effect size calculations 59 

i. The CAGE parameter 60 

Identification 61 

Impact and variation estimation 62 

m. Reporting 64 

6. Design 2: The non-clustered, blocked design 67 

a. FP model without baseline covariates 68 

Full sample analysis 68 

Subgroup analysis 72 

Using nonresponse weights 73 

Assessing baseline equivalence 74 

b. FP model with baseline covariates 74 

Full sample analysis 74 

Subgroup analysis 76 

c. SP model without baseline covariates 77 

Full sample analysis for the PATE parameter 79 

Subgroup analysis for the PATE parameter 80 

The UATE parameter 80 

d. SP model with baseline covariates 81 

e. Matched pair designs 83 

f. The CAGE parameter 84 

7. Design 3: The clustered, non-blocked design 85 

a. FP model without baseline covariates 86 

Full sample analysis 86 

Calculating intraclass correlation coefficients (ICCs) 91 

Subgroup analysis 92 

Including nonresponse weights 94 

Assessing baseline equivalence 94 


VI 


b. FP model with baseline covariates 95 

Full sample analysis 95 

Subgroup analysis 98 

c. SP model without baseline covariates 99 

Full sample analysis for the PATE parameter 100 

Subgroup analysis for the PATE parameter 101 

The CATE parameter 102 

Assessing baseline equivalence 103 

d. SP model with baseline covariates 103 

e. The CAGE parameter 103 

8. Design 4: The clustered, blocked design 105 

a. FP model without baseline covariates 105 

b. FP model with baseline covariates 108 

c. SP model without baseline covariates 110 

d. SP model with baseline covariates 112 

e. Matched pair designs 112 

f. The CAGE parameter 113 

9. Simulation analysis 115 

a. Simulation methods 115 

b. Simulation results 118 

Appendix A: Mathematical proofs 125 

Proof of Lemma 5.1 125 

Proof of Lemma 5.2 126 

Proof of Lemma 5.3 126 

Proof of Lemma 5.4 128 

Proof of Lemma 5.5 128 

Proof of Lemma 5.6 130 

Proof of Lemma 6.2 131 

Proof of Lemma 7.1 133 

Proof of Lemma 7.2 135 

Proof of Lemma 7.3 136 

References 139 

vii 


Tables 


Table 1. Summary of designs in RCT-YES 6 

Table 2. Dictionary of input statements for RCT-YES 9 

Table 3. Mathematical notation and acronyms for the statistical analysis 16 

Table 4. Equation numbers for variance estimators, by design and model specification 28 

Table 5. Simulation results for Design 3: average of estimated ATEs across replications 120 

Table 6. Simulation results for Design 3: standard error estimates across replications 121 

Table 7. Simulation results for Design 3: Type 1 errors across replications 122 

Table 8. Simulation results for Design 4 for the design-based SP estimator without 

covariates 123 

Table 9. Simulation results for Design 4 for the design-based SP estimator with covariates 124 


viii 



Purpose of the report 


The RCT-YES software package estimates average treatment effects for randomized controlled trials 
(RCTs) of interventions and policies, where individuals or groups of individuals are randomly 
assigned to treatment or control groups. The development of RCT-YES was funded by the Institute 
of Education Sciences (lES) at the U.S. Department of Education (ED) to facilitate the conduct of 
RCTs by state and local education agencies to test promising interventions and policies in their 
service areas. Student- and teacher-level data from state longitudinal data systems (SEDSs) provide a 
rich data source for such evaluations, although other data sources could also be used for the analysis. 
By taking advantage of opportunities to conduct RCTs of new or existing policies, “opportunistic 
experiments” offer the chance for education agencies and policymakers to generate rigorous evidence 
about what works in the decisions they make every day. 

RCT-YES estimates average treatment effects— grounded in rigorous statistical theory— for a wide 
range of designs used in education research. The program estimates intervention effects by 
comparing the average outcomes of those randomly assigned to different research conditions for the 
full sample and for baseline subgroups of students, educators, and schools. The program conducts 
hypothesis tests to assess the statistical significance of the estimated effects and reports evaluation 
findings in formatted tables that conforms to the presentation of RCT findings in lES-published 
reports. The program was designed to minimize user input for accessing and running the program 
and the data required for estimation. While RCT-YES was developed for RCTs in the education 
area, it is also applicable to RCTs in other fields. It can also be used to estimate intervention effects 
for quasi-experimental designs with comparison groups. 

RCT-YES must be considered a tool for analyzing RCT data, and is not a substitute for researcher 
experience and judgment. A successful RCT hinges on the suitability of the design for addressing 
well-defined causal research questions with sufficient statistical power, the successful 
implementation of the intervention, and high quality study data. Even if these conditions are met, 
a well-conducted analysis of RCT data requires considerable expertise in a range of methodological 
areas, such as the construction of outcome measures, impact estimation methods, hypothesis testing, 
adjustments for missing data, and the interpretation and reporting of evaluation findings. Thus, the 
policy relevance of the results produced by RCT-YES will largely depend on the rigor of the study 
design, the quality of the input data, and user expertise in correctly specifying program inputs and 
interpreting the program output. Where appropriate, users may want to consult with individuals 
trained in RCT methodology to gain the most out of the program. In addition, users may want to 
compare evaluation findings produced by RCT-YES to those found using other software and 
methods. 
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This technical report presents the statistical theory underlying RCT-YES. We discuss a unified design- 
based approach for impact estimation using the building blocks of the causal inference model that 
underlies experimental designs. We bring together and build on the recent statistical literature on 
these methods, using in particular Imbens and Rubin (2015), as well as Freedman (2008), Imai, 
King, and Nall (2009), Lin (2013), Miratrix, Sekhon, and Yu (2013), Schochet (2010, 2013), and 
Yang and Tsiatis (2001). The theory also builds on the statistical literature on design-based methods 
for analyzing survey data with complex sample designs (see, for example. Fuller, 1975 and 2009; 
Lohr, 2009; Rao and Shao, 1999). 

Our focus is on the estimation of average treatment effects for a wide range of ROT designs used in 
education research, including blocked and clustered designs. We consider impact estimation for the 
full sample and baseline subgroups. We derive simple differences-in-means estimators as well as 
regression estimators that adjust for baseline covariates. We discuss variance estimation, the 
asymptotic distributions of the considered estimators, hypothesis testing, weighting to account for 
data nonresponse or other reasons, and methods to assess baseline equivalence of the treatment and 
control groups. A simulation analysis is conducted to assess the statistical performance of the design- 
based estimators and other commonly-used RCT estimators. 

The report is intended for methodologists with a strong background in statistical theory, although 
the introductory chapters may be of interest to those with some methodological training who seek 
an overview of design-based statistical inference for RCT designs. The RCT-YES User’s Manual 
provides details on how to run the program in SAS, R, or Stata using a free desktop interface 
application and is intended for a broader audience. It provides a beginner’s introduction to RCTs, 
an intuitive overview of the different designs estimated by the program, and real-world examples of 
program inputs and outputs. Future changes and updates to RCT-YES will be provided in 
supplemental technical documents. 

The remainder of this report is in nine chapters. Chapter 1 provides an overview of the designs and 
methodological topics considered in this report. Chapter 2 provides background information on 
RCT-YES data requirements and program input specifications, and Chapter 3 discusses how the 
program addresses data disclosure issues. Chapter 4 introduces the design-based approach for impact 
estimation for RCT designs, the reasons we adopt this approach rather than the model-based 
approach that is more commonly used in education research, and key statistical assumptions that 
underlie the design-based theory. Chapters 5 to 8 discuss the four main RCT designs in RCT-YES 
that are defined by their clustering and blocking status, and Chapter 9 presents results from a 
simulation analysis to assess the statistical performance of the design-based estimators. Key 
theoretical results are provided in the main text; mathematical proofs are provided in Appendix A. 
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1. Designs and Analyses in RCT-YES 


1. Overview of the considered RCT designs and methodoiogicai topics 

RCT-YES estimates intervention effects for commonly used education RCT designs that address the 
following two causal research questions: 

1. What are average effects of the intervention on student (or educator) outcomes for the full 
sample? 

2. Do intervention effects differ for key subgroups of students, educators, schools, and 
community contexts defined by their pre-randomization (baseline) characteristics? 

The program addresses these research questions by comparing the mean outcomes of those 
randomly assigned to the treatment and control groups for the full sample and key baseline 
subgroups. 

RCT-YES estimates average treatment effects— hereafter, referred to as “ATEs”— for RCT designs defined 
by two key features. First, the designs are defined by the unit of randomization: 

• Nonclustered designs, where individual students are randomly assigned to a treatment or 
control condition. 

• Clustered designs, where groups (such as schools or classrooms) are randomized to a research 
condition. Under these designs, all students within a group are assigned to the treatment or 
control status of their group. 

Second, the designs in RCT-YES are defined by whether random assignment is conducted separately 
within blocks (strata): 

• Non-blocked designs, where random assignment is conducted for a single population (for 
example, within a single school district). These designs can be clustered or non-clustered. 

• Blocked designs, where random assignment is conducted separately within non-overlapping 
subpopulations that comprise the entire sample. Blocked designs can be clustered or non- 
clustered designs. An example of a non-clustered, blocked design is a multi-district RCT 
where students are randomly assigned within each school district (site). Blocked designs also 
include three types of designs that are often used in education research: (1) matched paired 
designs where similar units are paired and random assignment is then conducted within each 
pair, (2) designs where random assignment is conducted separately within demographic 
subgroups (for example, for girls and boys) to ensure treatment-control group balance for 
each subgroup, and (3) longitudinal designs where random assignment is conducted 
separately by cohort (for example, incoming third graders in two separate years). 
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1. Designs and Analyses in RCT-YES 


This report is structured around the four RCT designs that combine these two key design features. 
These four designs are summarized in Table 1, including their RCT-YES data requirements and 
default specifications for impact estimation. Each considered design is discussed in its own chapter. 

The report covers the following topics: 

• ATE estimation for the full sample as well as population subgroups that are defined by pre- 
intervention (baseline) characteristics (moderator analyses). 

• Simple differences-in-means estimators as well as estimators from regression models that 
adjust for baseline covariates to improve the precision of the ATE estimates. 

• Standard error estimation and significance testing of the null hypothesis of a zero ATE 
against the alternative that it differs from zero, including multiple comparisons corrections. 

• Estimators for (1) finite-population (EP) models where results are assumed to pertain to the 
study sample only (the default RCT-YES specification) and (2) super-population (SP) models 
where results are assumed to generalize outside the study sample to a broader population of 
similar students and schools. 

• Estimators that incorporate weights to adjust for data nonresponse or other reasons. 

• Methods to assess baseline equivalence of the treatment and control groups using baseline 
covariates. 

RCT-YES can estimate impacts on continuous outcomes (such as student achievement test scores) 
and binary outcomes (such as high school graduation status) that are measured after random 
assignment. We consider ordinary least squares (OLS) methods to obtain regression-adjusted impact 
estimates for both continuous and binary outcomes; we do not consider estimation methods using 
logit or probit models for binary outcomes. 

We focus on ATE estimation assuming a single treatment and control group (or two treatment 
groups). The methods that we discuss, however, apply also to designs with multiple treatment groups 
where pairs of treatment conditions are compared to each other. Users can estimate these pairwise 
impacts in separate runs of RCT-YES. The program, however, does not correct Type I error rates for 
multiple testing across these pairwise comparisons, which would need to be performed outside the 
program (see Schochet, 2009 for a discussion of these methods). 

RCT-YES provides estimates of the intention-to-treat (ITT) parameter— that is, treatment effects on 
the offer of intervention services. In addition, if data are provided on the take-up of intervention 
services by treatment group members (and control group “crossovers”), RCT-YES provides optional 
estimates of the complier average causal effect (CAGE) that pertains to “compliers”— those who 
would receive intervention services as a treatment but not as a control (see, for example, Angrist, 
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1. Designs and Analyses in ROT-YES 


Imbens, and Rubin, 1996; Bloom, 1984; Heckman, Smith, and Taber, 1998; and Schochet and 
Chiang, 2011). The CAGE parameter is also known as the local average treatment effect (LATE) 
parameter or treatmennomthe-treated (TOT) parameter. This report considers design-based 
estimation methods for both the ITT and CAGE parameters. 

RCT-YES does not conduct additional analyses that researchers sometimes employ to help 
understand the variation in treatment effects in RGTs (see Schochet et ah, 2014). Eor example, the 
program does not conduct analyses to identify mediating factors that account for treatment effects 
on longer-term outcomes, examine the variation in treatment effects for subgroups defined by their 
post-baseline experiences, or estimate quantile treatment effects to assess how intervention effects 
vary along the distribution of an outcome measure. Rather the focus of RCT-YES and the methods 
presented in this report is on ATE estimation for the full sample and baseline subgroups. 
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1. Designs and Analyses in RCT-YES 


Table 1. Summary of designs in RCT-YES 


Design 


Unit of random 
assignment 


Biocking 


Data requirements and key defauit RCT-YES 
specifications for ATE estimation 


Design 1: 

Non-clustered, 

non-blocked 


Students or 
other 

individuals 


None 


Input data requires one record per observation, and 
outcome data for at least 2 treatments (Ts) and 2 
controls (Cs) 


Deletion of cases with missing values for the 
considered outcome 


Simple differences-in-means estimator 
Finite population (FP) model 


Design 2: 

Non-clustered, 

blocked 


Students or 
other 

individuals 


Design 3: 

Clustered, 

non-blocked 


Districts, 

schools, 

classrooms, 

etc. 


Districts, schools, 
classrooms, 
matched pairs, 
demographic 
groups, cohorts 


Input data requires one record per observation with 
block identifiers that could be masked 

Blocks are included if they contain at least 2 Ts and 
2 Cs with outcome data; at least 1 T and 1 C are 
required for the super-population (SP) model option 
and the FP model with the BLOCK_FE option 


Deletion of cases with missing values for the 
considered outcome 


Simple differences-in-means estimator within each 
block; blocks are weighted by their student sample 
sizes to obtain overall impact estimates 

FP model, except for matched pair designs 


None Input data requires one record per observation or 

one record per cluster (cluster averages) with 
cluster identifiers that could be masked 

Clusters are included if they contain at least 1 
observation with outcome data 

Deletion of cases with missing values for the 
considered outcome 

Simple differences-in-means estimator using 
cluster averages; clusters are weighted equally to 
obtain overall impact estimates 

FP model 


Design 4: 

Clustered, 

blocked 


Districts, 

schools, 

classrooms, 

etc. 


Districts, schools, 
matched pairs, 
demographic 
groups, cohorts 


Input data requirements combine those from 
Design 2 for blocks and Design 3 for clusters 

Simple differences-in-means estimator using 
cluster averages; clusters are weighted equally to 
obtain block estimates, and blocks are weighted by 
their number of clusters to obtain overall impact 
estimates 


FP model, except for matched pair designs 
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2. Data file and program inputs 


2. RCT-YESAbX£l requirements and input specifications 

The RCT-YES program is free and users can run the program in SAS, R, or Stata using a desktop 
interface application to specify the program inputs. The format of the input dataset must conform 
with the statistical package used for estimation (a .SAS7BDAT file for SAS, an .RDA file for R, or a 
.DTA file for Stata). After specifying all inputs, the interface will generate a SAS, R, or Stata program 
file that users will then need to run in a separate step using procedures that they typically employ to 
run such programs. The program will output an HTML file (named in the interface) containing 
formatted tables that present the impact findings. 

For non-clustered designs (Designs 1 and 3), RCT-YES requires individuaUevel data with one record 
per individual in the study sample, including those with missing data. Individuals will typically be 
students, but they could also be teachers or principals if the intervention targets educators and their 
outcomes. For clustered designs (Designs 2 and 4), RCT-YES can accommodate data in two formats: 
(1) individual-level data or (2) data that have already been averaged to the cluster level (for example, 
average school test scores for students in the study sample). For the latter format, the input data file 
must contain a separate set of cluster-level averages for the full sample analysis and each subgroup 
analysis. 

For all designs, the data file must contain a treatment status indicator variable that is coded as 1 for 
treatments and 0 for controls (or 1 for one treatment group and 0 for another). This variable must 
be available for all observations or RCT-YES will not conduct the analysis. 

The data file does not need to include student identifiers (such as name, address, or date of birth). 
However, the data file must contain block and/or cluster identifiers for Designs 2, 3, and 4 for all 
observations. Importantly, these identifiers could be masked so as not to reveal the specific names or 
locations of blocks or clusters in the sample. 

The data file must contain data on each specified outcome measure, ideally including records with 
missing data so that the program can compute study attrition rates. To estimate impacts for a 
particular outcome, RCT-YES excludes from the analysis observations with missing values for that 
outcome. The program can accommodate weights in the input data file to adjust for data 
nonresponse or other design-related reasons. If weights are specified, they must be positive and 
available for all cases with non-missing outcome data or RCT-YES ignores the weights. 

If users are interested in conducting subgroup analyses, the file will need to contain categorical 
variables that define the baseline subgroups. Users should be aware that it is good research practice 
to include only a small number of key, policy-relevant subgroups for the analysis that aligns with the 
study’s conceptual model, and to avoid ex post “fishing” for positive subgroup findings that could 
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lead to spurious impact findings (see, for example, Schochet, 2009). RCT-YES will exclude from the 
subgroup analysis cases with missing subgroup data. 

If users are interested in obtaining regression-adjusted ATEs, the data file will need to contain data 
on each specified baseline covariate, which can be continuous or binary. Only a small number of 
covariates— which are highly correlated with the outcomes measures— should be included in the 
regression models to avoid estimation complexities. By default, RCT-YES requires that there must 
be at least 5 observations (clusters) per covariate or the regression analysis is not performed. RCT- 
YES imputes missing covariates for covariates with fewer than 30 percent missing values for both the 
treatment and control groups (using mean imputation), but excludes covariates with more missing 
values from the analysis. These defaults can be changed using program options. 

Table 2 displays a dictionary of input variables for running RCT-YES to provide context for the 
methodological topics covered in this report. Because RCT-YES is being designed to minimize user 
input to accommodate users with diverse backgrounds, the program uses a number of default 
specifications for the analysis; users must be aware that these defaults might not apply in all contexts, 
and should use program options to change the default specifications where appropriate. 



2. Data file and program inputs 


able 2. Dictionary of input statements for RCT-YES 

Input 

Variable 

Variable 

Additional 

variable 

definition 

format 

Information 

Input and Output Data Files 

DATA_FILE 

Name of input data file 

One record per student, 
educator, or cluster 

Required 

OUTPUT_FILE 

Name of output file 

Contains formatted tables 

Required 

TITLE 

Title for program output 

Character 

Optional 

Design Parameters and Treatment Status, Biock, and Ciuster Identifiers 

DESIGN 

Type of design 

1 = Non-clustered, non-blocked 

2 = Non-clustered, blocked 

3 = Clustered, non-blocked 

4 = Clustered, blocked 

Required 

TC _STATUS 

Name of treatment or control 

0 = Control 

Required for all obsen/ations 


status indicator variable 

1 = Treatment 


BLOCK_ID 

Name of variable containing the 

Numeric or character 

Required for Designs 2 and 4 for 


block identification codes 


all observations 




For the default finite-population 
(FP) model, blocks are included if 
they contain at least 2 treatments 
and at least 2 controls with 
outcome data that vary 




For the optional SP model or 
BLOCK_FE=l FP model, at least 1 
treatment and at least 1 control 
are needed 

MATCHED_PAIR 

Indicator for a matched pair 

0 = Not a matched pair design 

Required for Designs 2 and 4 for 


design 

(default) 

matched pair designs 



1 = Matched pair design 

Pairs are included only if data are 
available for both pair members 




The super-population (SP) model 
is used for estimation 

CLUSTER_ID 

Name of variable containing the 

Numeric or character 

Required for Designs 3 and 4 for 


cluster identification codes 


all observations 




Clusters are included if they have 
at least one observation with 
outcome data 

CLUSTER_DATA 

Indicator for clustered designs as 

0 = Cluster-level averages 

Required for Designs 3 and 4 


to whether the input file contains 
individual- or cluster-level data 

1 = Individual level 
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2. Data file and program inputs 


Input 

Variable 

Variable 

Additional 

variable 

definition 

format 

information 

CLUSTER_FULL 

If CLUSTER_DATA = 0, the name 

0 = Record pertains to a 

Required for Designs 3 and 4 if 


of a binary variable in the data file 

subgroup cluster average 

CLUSTER_DATA = 0 


that indicates whether the cluster- 
level average pertains to the full 
sample or a subgroup 

1 = Record pertains to the full 
sample cluster average 


SUPER_POP 

Indicator of preference for the 

0 = Finite-population (FP) model 

Optional 


super-population (SP) model 

(default) 




1 = SP model 


CATE_UATE 

Indicator for SP designs that the 

0 = PATE (default) 

Optional for Designs 2 to 4 if 


CATE or UATE parameters should 

1 = CATE 

SUPER_POP = 1; default is the 


be estimated (see text) 

2 = UATE 

PATE parameter 

BLOCK_FE 

Indicator for blocked FP and some 

0 = Model should include 

Optional for Designs 2 and 4 


SP designs that the model should 
contain main block effects but not 
block-by-treatment interactions 

interactions (default) 

1 = Model should include main 
block effects only 

Applies to the FP model and the 
CATE parameter for the SP model 

GOT_TREAT 

Name of variable indicating the 

If DESIGN= 1 or 2 or DESIGN = 

Optional for estimating compiler 


receipt of intervention services for 

3 or 4 and CLUSTER_DATA=0: 

average causal effects (CACE) 


the treatment and control groups. 
The variable should be binary for 

0 = Treatment not received 

pertaining to those who would 
receive intervention services as a 


all designs except if 

1 = Treatment received 

treatment but not as a control 


CLUSTER_DATA = 1, in which case 
the variable should be a numeric 
service receipt rate between 0 

If CLUSTER_DATA=1: 
Numeric: > 0 and < 1 

Up to 2 variables can be specified 
for each outcome group 


and 1. 


Cases with missing values are 
excluded from both the CACE and 
ATE analyses 

Outcome Measures, Subgroups, and Covariates 

OUTCOME 

Name of outcome measure 

Numeric; all missing data codes 

Required; outcomes with common 



are valid based on the language 

analyses can be input in groups 



used (SAS, Stata, or R) 

Cases with missing values for an 
outcome are excluded from the 
analysis for that outcome 

LABEL 

Label for outcome measure 

Character 

Optional; default is the label in the 



Blank = Default 

data file 

STD_OUTCOME 

Student-level standard deviation 

Numeric > 0 

Required for Designs 3 and 4 if 


of the outcome measure 

Blank = Default 

CLUSTER_DATA = 0 in order for 
the program to calculate impacts 
in effect size units 




Optional for other designs; default 
is the full sample standard 
deviation for the control group 

SUBGROUP 

Name of subgroup variable 

Categorical; all missing data 

Optional 



codes are valid based on the 
language used (SAS, Stata, or 
R) 

Baseline subgroups can pertain to 
student, teacher, school, or other 
characteristics and must be large 
enough to protect data disclosure 
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2. Data file and program inputs 


Input 

Variable 

Variable 

Additional 

variable 

definition 

format 

information 

COVARIATES 

List of names of baseline 

Numeric: continuous or binary; 

Optional 


covariates to obtain regression- 

all missing data codes are valid 

There must be a minimum 


adjusted impact estimates for full 

based on the language used 

number of observations (clusters) 


sample or subgroup analyses 

(SAS, Stata, or R) 

per covariate or regression 
analyses are not conducted (see 
OBS_COV below) 




Covariates are excluded if they 
contain too many missing values 
(see MISSING_COV below) 




A different set of covariates can 
be specified for each outcome 
and each subgroup 

Baseline Equivalency Analysis 

BASE_EQUIV 

List of names of baseline 

Numeric: continuous or binary; 

Optional 


covariates that are to be used to 

all missing data codes are valid 



assess baseline equivalence for 

based on the language used 



treatments and controls 

(SAS, Stata, or R) 


Weights 

WEIGHT 

Name of the observation-level 

Numeric 

Optional; default is equal 


weight that provides information 
on how to weight blocks and/or 
clusters to obtain pooled 
estimates and to adjust for 
missing data (nonresponse) or 
unequal sampling probabilities 
for other design-related reasons 

Blank = Default weights 

weighting of all individual records 
for non-clustered designs and 
equal weighting of clusters for 
clustered designs 

A different weight can be specified 
for each outcome and each 
subgroup 




Weights must be positive and 
nonmissing for cases with 
available outcome data or the 
weights are ignored 

Ruies for Inciuding Baseiine Covariates in the Regression Modeis 

MISSING_COV 

Maximum percentage of missing 

Numeric: 0 to 75 

Optional; default is 30. This 


data for a baseline covarlate to be 


means that a covariate is included 


Included In the regression models. 


in the regression models if it is 


This condition is applied to both 


missing for at most 30 percent of 


the treatment and control groups. 


cases for both research groups. 


OBS_COV Required ratio of the number of Numeric > 1 Optional; default is 5 

observations per covariate for the 
regression analysis and joint test 
of baseline equivalency to be 
performed. The variable pertains 
to the number of clusters for 
clustered designs and to the 
number of blocks for super- 
population PATE and UATE 
blocked designs. 
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2. Data file and program inputs 


Input 

Variable 

Variable 

Additional 

variable 

definition 

format 

information 

Reporting 

MIN_NUM 

Minimum group size adopted by 
the state or other entity for 
reporting outcomes to protect 
personaiiy identifiabie 
information (Pii) 

Integer > 5 

Optional; default is 10 

ALPHA_LEVEL 

Significance ievei for testing the 
nuii hypothesis of zero average 
treatment effects (in percents) 

Integer: Ito 30 

Optional; default is 5 
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3. Minimizing disciosure of personaiiy identifiabie information 

In reporting results from education RCTs, researchers must consider the protection of personally 
identifiable information (PII) on students and educators. For some data sources, this protection is 
mandated by law. For example, the Family Educational and Privacy Rights Act (FERPA) legally 
requires PII protection for student education records. In general, RCT findings should only be 
reported for subgroups that are sufficiently large and for outcomes that have sufficient variation 
across the sample so that it is not possible for someone to infer sensitive information for an 
individual student (such as an achievement test score). Two Technical Briefs published by the 
National Center for Education Statistics (NCES) provide a detailed discussion of data disclosure 
issues for the reporting of statistics using SLDS data (NCES 201T601, November 2010; NCES 2011- 
603, December 2010). 

It is very difficult to develop a computer program that can prevent PII disclosure in all instances. 
Thus, RCT'YES users will need to carefully assess which impact findings can be reported in their 
own contexts. RCT-YES, however, employs several key features to help minimize data disclosure risks. 
Eirst, the program provides descriptive statistics on all outcomes, subgroups, and covariates that are 
listed as inputs into the program, and provides formatted tables that indicate data problems (for 
example, outcomes or subgroups with small sample sizes). Users can use this information to update 
the input data files and program specifications. 

Second, the program uses several criteria for excluding outcomes, subgroups, and baseline covariates 
from the analysis and for reporting specific impact findings. These criteria follow some of the best 
reporting practices specified in a Technical Brief published by NCES on statistical methods for PII 
protection in the aggregate reporting of state longitudinal data system (SEDS) data (NCES 201 T 
603, December 2010). These criteria include: 

• Omitting outcomes, subgroups, and baseline covariates that have small numbers of 
students with available data. Individual states have adopted minimum group size rules for 
reporting SEDS outcomes to prevent PII disclosure. Most states have set this minimum group 
size to be 10 students (the default in RCT-YES), but in 2010, the minimum number ranged 
from 5 to 30. This threshold value can be set using the MIN_NUM input variable in RCT- 
YES (it must be at least 5). The program checks that the minimum size threshold holds for 
both the treatment and control groups. 

• Omitting the entire subgroup category if any subgroup within that category is too small. 

If any subgroup has fewer than the minimum number of students from above, the entire 
subgroup is omitted from analysis. Eor instance, to examine impacts for race/ethnicity 
categories, if one category has too few sample members (for example. Pacific Islanders), the 
program omits all race/ ethnicity categories from the analysis. This procedure is used because 
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knowledge of the outcomes from the larger subgroups and for the full sample can be used to 
calculate the outcomes of students in the small subgroups. In these cases, users should 
combine small subgroup categories into larger ones or omit the small subgroups from the 
input statements. 

• Omitting outcomes and baseline covariates that do not have sufficient variation. RCTYES 
conducts analyses using only outcomes and covariates whose values vary across the sample; 
this condition must hold for both the treatment and control groups. The program excludes 
variables that have zero variance (this removes outcomes that all have the same value). In 
addition, RCT-YES excludes binary outcomes or covariates where there are fewer than 5 
observations with a value of 0 or fewer than 5 observations with a value of 1 for either the 
treatment or control group. 

• Not reporting impact findings for individual blocks (for example, sites) or mean outcomes 
for individual clusters (for example, schools). The concern is that student sample sizes in 
some blocks or clusters might be small, which could lead to data disclosure issues. RCT-YES, 
however, produces summary statistics on impact estimates across blocks so that users can 
examine the variation in the block-specific impact findings. 

• Reporting findings for binary outcomes by multiplying them by 100 and reporting them 
as whole numbers without decimals. This procedure can help guard against data disclosure 
for binary variables with means near 0 or 100 percent. 

The program does not mask variables (by hiding original data with random numbers or characters) 
or top- or bottom-code continuous variables (by setting maximum or minimum data values), because 
the goal of the program is to generate impact estimates that are transparent and replicable. 
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4. Overview of design-based methods used in RCT-YES 

Design-based methods for experimental designs were introduced by Neyman (1923) and later 
developed in seminal works by Rubin (1974, 1977) and Holland (1986) using a potential outcomes 
framework. A parallel literature exists in econometrics based on the Roy (1951) switching regressions 
model (see Heckman, 2008). This chapter provides an overview of these methods— in an education 
context— for each considered design and how they are applied in RCT-YES. Our focus is on the 
underlying ATE parameters for each design; estimators for these parameters are discussed in detail 
in Chapters 5-8. We consider student-level analyses, although the methods apply also to teacher- or 
principal-level analyses. The purpose of this overview is to lay the foundation for the more theoretical 
treatment of these methods in the ensuing chapters. We conclude with a discussion of our rationale 
for using design-based methods in RCT-YES instead of model-based and other common methods, a 
summary of simulation findings on the performance of the design-based estimator, key design 
assumptions that underlie all the considered estimators, and a brief summary of the impact and 
variance estimators presented in this report. Table 4 summarizes the notation and acronyms used 
for the statistical analysis. 

a. Original Neyman finite-population (FP) model 

The original Neyman-Rubin-Holland model considered a non-clustered, non-blocked RCT design 
(Design 1). To describe this model in the education context, consider an experimental design where 
n students from a single population are randomly assigned to either a single treatment or control 
condition (or two treatment conditions). Let hl(l) be the “potential” outcome (for example, a test 

score) for student i in the treatment condition and ^^(0) be the potential outcome for the same 
student in the control condition. Using the original Neyman-Rubin-Holland formulation, these 
potential outcomes are assumed to be fixed for the study, which is a finite-population (FP) model. 

The difference between the two fixed potential outcomes, (f)(1) -1)(0)) , is the student-level 
treatment effect, and the ATE parameter, P„ctus,FP > h the average treatment effect over all students: 

(4.1) = r(i)-r( 0 )=^X(r,(i)-r,( 0 )) . 

^ i=l 

This ATE parameter— also referred to in the literature as the sample average treatment effect (SATE; 
see Imbens, 2004)— cannot be calculated directly because potential outcomes for each student 
cannot be observed in both the treatment and control conditions. Formally, let T^ be the 
random assignment variable that equals 1 if a student is assigned to the treatment condition and 0 
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Table 3. Mathematical notation and acronyms for the statistical analysis 


Subscript, Variable, or Acronym 

Definition 

Subscripts 

i = 

Students 


Clusters, such as schools or classrooms, with n^- students per cluster 

b = 

Blocks, such as school districts 

g = \,...,s 

Subgroups defined by pre-intervention characteristics 

k = \,...,v 

Baseline covariates 

R,I,B,S 

R = Randomization distribution; I,B,S = the universe of students, 
blocks, and schools in the respective super-populations 

Variables 

7(1) 

Potential outcome in the treatment condition 

7(0) 

Potential outcome in the control condition 

T 

Treatment status indicator: 1 for treatments, 0 for controls 

y 

Observed outcome 

G 

Subgroup indicator: 1 for those in subgroup g and 0 otherwise 

S 

Block indicator: 1 for those in block b and 0 otherwise 

P 

Sampling rate to the treatment group 

q 

Proportion of the total sample in a block or subgroup 

w 

Weight for aggregating blocks or clusters to obtain pooled estimates 
and for adjusting for data nonresponse or other reasons 

x,z 

Vectors of baseline covariates to obtain regression-adjusted estimators 

R 

Data response indicator: 1 for those with nonmissing data and 0 for 
those with missing data 

Key Acronyms 

ATE 

Average treatment effect; also referred to as “impact” 

CAGE 

Complier average causal effect parameter that pertains to intervention 
effects for those who comply with their treatment assignments 

FP, SP 

FP = Finite-population model where ATEs are assumed to pertain to the 
study sample only; SP = Super-population model where ATEs are 
assumed to generalize to a broader population 

ICC 

Intraclass correlation coefficient 

OLS, MR 

Regression estimators: OLS = Ordinary least squares; MR= multiple 
regression with baseline covariates 

PATE, CATE, UATE 

ATE parameters for SP models: PATE = Population average treatment 
effect (ATE); CATE = Cluster ATE; UATE = Unit ATE 

Pll 

Personally identifiable information 

RCT 

Randomized controlled trial 

SLDS 

State longitudinal data system 

WWC 

What Works Clearinghouse at the institute for Education Sciences (iES) 
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if the student is assigned to the control condition. The data generating process for the observed 
outcome for a student, T > can then be expressed as follows: 

(4.2) + 

This simple relation formalizes the randomization mechanism that we can observe 1^(1) if 71 equals 

1 and 1)(0) if T. equals 0. Design-based methods use the simple relation in (4.2) to develop simple 
differences-in-means and regression estimators for pp , their standard errors, and their large- 
sample asymptotic distributions for hypothesis testing. In this framework, the ATE estimators (which 
are functions of the observed T, ) are random only because 71 is random. 

b. Extending the Neyman-Rubin-Holland model to clustered designs 

In RCTs of education interventions, random assignment is often performed at the group level (such 
as a school or classroom) rather than at the student level. These group-based designs are common, 
because education RCTs often test interventions that are targeted to the group (for example, a school 
re-structuring initiative or professional development services for all teachers in a school). Thus, for 
these types of interventions, it is infeasible to conduct random assignment at the student level, even 
though interest often lies in intervention effects on students. In addition, clustered designs are often 
used to minimize the spillover of intervention effects from treatment to control group students 
through their interactions which could contaminate the estimated impacts. 

To extend the Neyman-Rubin-Holland potential outcomes framework to clustered designs, we 
follow Schochet (2013) and assume that the sample contains m groups— hereafter referred to as 
schools— that are randomly assigned to a research condition, where the sample contains yij students 
from school j . Let (1) and (0) be fixed potential outcomes for student i in school j , and let 
Tj be the random assignment variable that equals 1 for treatment schools and 0 for control schools. 
The ATE parameter for the clustered design, pp , can then be expressed as follows: 

£w, (7/1) -7,(0)) 

B = 

Hclus^FP 


(4.3) 


m 
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where Yj (1) = r,(0)=(2::': j Yy (0) / flj ) are mean potential outcomes in the 

treatment and control conditions for students in school j ; '^j are school-level weights; 

and Wjj are student-level weights. 

The ATE parameter pp is a weighted average of the ATE parameters in each school. A central 

research question is whether interest lies in intervention effects for (1) the average student in the 
sample (w^ =1 and Wj =rij) or (2) a student in the average school in the sample (Wy =(l/n^) and 

Wj = 1 ). This distinction will only matter if student sample sizes vary across schools and ATEs vary 
by school size. The default weight in RCT-YES is Wj = \, so that each school is weighted equally in 
the analysis; this weighting scheme aligns with the random assignment mechanism. In this case, the 
ATE parameter is Pcius fp .-i (0 ~ (Q)) ! • If interest lies instead in ATEs for the 

average student, RCT-YES users can include a weight variable in the program input file where = 1 
for each observation. 

Eor the clustered design, the data generating process for the observed mean outcome for a school, 
yj , can be expressed as follows: 

(4.4) j7^. = r//i)+(i-r^.)7^.(0), 

where Y j =Q^^yij ! nj) . 


As discussed in detail in Chapter 7, this simple relation between the observed and potential school- 
level outcomes can be used to develop estimators and standard errors for that are 

nonparametric in the sense that they do not require assumptions on the distributions of potential 
outcomes. Eor a given total student sample size, the variances of the ATE estimators will typically be 
larger for clustered than non-clustered designs. 

c. Extending the Neyman-Rubin-Holland model to blocked designs 

Blocked designs are common in education research, for example, because RCTs are often conducted 
in several sites. In a blocked design, random assignment is conducted separately within each 
subpopulation, such as a school district, school, classroom, or matched pair. Blocking will improve 
the precision of the ATE estimators if the blocking is based on characteristics associated with the 
potential outcomes of interest. 
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It is straightforward to extend the Neyman-Rubin-Holland model to blocked designs, although the 
notation becomes more cumbersome. This can be done by (1) employing the potential outcomes 
framework described above separately within each block and (2) averaging the block-specific ATE 
parameters and estimators to obtain full population quantities. 

Consider first the non-clustered, blocked design (Design 2) with h blocks where we use the subscript 
“ b ” to indicate blocks for all the variables defined above. For example, is the number of students 

in the sample in block b , and are potential outcomes for students in block b , and so 

on. Let be a block indicator variable that equals 1 if student i is in block b and 0 for students 
in other blocks. Using Equation (4.1), we can then define the ATE parameter for block b as follows: 

(4.5) = = (j;^(l)_};^(0)), 

where f),(l) and 1^(0) are mean potential outcomes. The ATE parameter across all blocks can then 
be expressed as follows: 


(4.6) S 


h 

^ j nclus,b,FP 
b=\ 


nclus ,b locked ,FP 


z 

b=\ 


Wu 


In RCT-YES, the default weights for the non-clustered, blocked design are and = 1 so 

that blocks are weighted by their student sample sizes. In some designs where blocks are sites, 
researchers may instead want to weight each block equally (w^ =1; =(l/n^)). This approach 

yields the ATE parameter for a student in the average site. This weighting approach might be 
desirable if student sample sizes vary considerably across sites to avoid the large influence of some 
very large sites on the pooled impact estimates. 

The clustered design can incorporate blocks in a similar way (Design 4). Using (4.3) for the clustered 
design, we can define the ATE parameter in block b as follows: 
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_ _ 

( 4 . 7 ) Pclus,b,FP ~ ~ ’ 

Z ^jb 

j-Sjh=^ 

where Sjf^ is an indicator variable that equals 1 if school j is in block b and 0 for schools in other 
blocks. The ATE parameter across all blocks can then be expressed as follows: 

h 

LP^b!^dus}>,FP 

( 4 - 8 ) ,blocked,FP h 

b=\ 

The default weights in RCT-YES for (4.7) and (4.8) are = ( 1 / ^jt) > = 1 > and = nti^, so that 

blocks are weighted by their numbers of schools. Another weighting scheme for the FP model is to 
weight blocks equally (Wy^, = (l/(«yj^m^)) , Wji, = (1/m^) , and = 1) which can be implemented in 
RCT-YES by including a weight variable in the input data file. Another option is to weight students 
equally ( = 1 , Wj^, = , and ). 

Importantly, the choice of how to weight the blocks will affect the overall impact findings only if 
ATEs differ across blocks and are correlated with block size. To address this issue, RCT-YES provides 
descriptive statistics on the extent to which impacts vary across blocks and conducts a joint chi- 
squared test to assess whether the difference between the block impacts is statistically significant. 

d. The super-population model 

The original Neyman-Rubin-Holland model is a finite-population (FP) model that assumes that 
potential outcomes are fixed for the study. Under this approach, the ATE parameter pertains only 
to those students and schools at the time the study was conducted. Stated differently, the impact 
findings have internal validity but do not necessarily generalize beyond the study participants. This 
approach can be justified on the grounds that study samples are usually purposively selected for RCTs 
for a variety of reasons (such as the site’s willingness to participate and suitability for the study based 
on their populations and contexts). Similarly, students participating in the study may not be 
representative of a broader population of students in the study sites, because they could be a 
nonrandom subset of students who consented to participate in the study and who have available 
follow-up data. 
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Under this fixed population scenario, researchers are to be agnostic about whether the study results 
have external validity. Policymakers and other users of the study results can decide whether the 
impact evidence is sufficient to adopt the intervention on a broader scale, perhaps by examining the 
similarity of the observable characteristics of schools and students included in the study to their own 
contexts, and using results from subgroup and other analyses. 

In contrast, under the super-population (SP) model, potential outcomes are assumed to be random draws 
from super-population distributions. Thus, the impact findings are now assumed to generalize to the 
super-population of students, schools, and sites that are “similar” to those included in the study. The 
interpretation of this super-population will likely depend on the context (and may not exist), but 
researchers should be aware that the estimation of treatment effects using the SP approach makes 
the implicit assumption of external validity to a universe that is likely to be vaguely defined. 
Nonetheless, this approach can be justified on the grounds that policymakers may generalize the 
findings anyway, especially if the study provides a primary basis for deciding whether to implement 
the tested treatments more broadly. 

The literature has been growing on statistical methods to assess and improve the generalizability of 
results from experiments that, in some contexts, could be used to help gauge the credibility of the 
SP model assumptions (see, for example. Hedges and O’Muircheartaigh, 2012; Olsen, Bell, Orr, and 
Stuart, 2013; Shadish, Cook, and Campbell, 2002; Stuart, Cole, Bradshaw, and Leaf, 2011; and 
Tipton, 2013). These methods involve reweighting the experimental sample using baseline data so 
that its composition is similar to that of a target population of interest. The reweighting process 
requires comparable baseline data for study and target population members. 

As we shall see in Chapters 5 to 8, the variances of the ATE estimators are typically larger for the SP 
model than the FP model. This is because the ATE parameter for the SP model pertains to 
intervention effects for a broader population, with an associated loss in statistical precision. 

The default specification for RCT-YES is the FP model (except for matched pair designs). Users can 
request the SP model by setting the SUPER_PCP input variable equal to 1. 

Under the SP approach, the potential outcomes are random variables drawn independently across 
the sample. Under this model, the ATE parameter for the non-clustered, non-blocked design 
(Design 1) is 

( 4 . 9 ) AH^.sp=E,(y,m-m). 

where Ej signifies the expected value with respect to the simple random sampling of individuals 
(/) from the student super-population. Thus, sp ^he expected treatment effect in / . This 
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SP parameter is also referred to in the literature as the population average treatment effect (PATE; 
see Imbens, 2004). 

As discussed in Imai, King, and Nall (2009) and Imbens (2004), the SP model is more complex for 
clustered than non-clustered designs, because assumptions must be made about the multilevel sampling 
of schools and students from broader populations. Specifically, under the clustered SP model, it can 
be assumed that (1) schools are fixed for the study, but that students are randomly sampled within 
the study schools from a broader student population (the cluster average treatment effect [CATE]); 
(2) schools are randomly sampled from a broader school population, but that the student sample is 
fixed for the study (the unit average treatment effect [UATEj; or (3) both schools and students are 
randomly sampled from broader populations (the population average treatment effect [PATE] for 
clustered designs). 

Because of the subtleties of deciding between these various SP parameters, the default in RCT-YES 
is the PATE parameter (the assumed random sampling of both schools and students), but the 
CATE_UATE option can be used for estimating the CATE and UATE parameters (see Table 2 
above). The PATE parameter for the clustered, non-blocked design (Design 3) is as follows: 

E,JwAYAl)-Y. 

• 10 ) ' e}wj) 



where is the weight for school j , and Eis is the expected value of the treatment effect in the 
super-population of students (/ ) within the super-population of schools (5). 

In RCT-YES, the default specification for the school-level weight in (4.10) is Wj = \ . Users, however, 
can select different weighting schemes using input weight variables. In the SP context, if interest lies 
in the intervention effect for the average student in 5 , one choice for Wj recommended by Imai, 

King, and Nall (2009) is a measure of the size of the student universe in each school (assuming this 
universe is finite). 

The PATE, CATE, and UATE parameters pertain also to blocked SP designs (Designs 2 and 4). By 
default, RCT-YES estimates the PATE parameter where blocks are assumed to be randomly sampled 
from a broader block population. Eor instance, if blocks are school districts, the PATE assumption 
would imply that the study school districts are representative of a larger population of similar school 
districts that could be targeted for the intervention (perhaps in the same state). This assumption 
could be realistic if the study contains a large number of geographically dispersed school districts 
that could be targeted for the intervention. This design is often referred to in the statistics literature 
as a random block design. 
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The PATE parameter for the non-clustered, blocked SP design (Design 2) is 


( 4 . 11 ) P, 


nclus, blocked, PATE 




Ewi^b) 


where represents the expected value with respect to students in / within the super-population 
of blocks {B). In RCT-YES, the default weight is > but a broader measure of the block 

population size might be more appropriate for the SP model (assuming this population is finite). 
Another possible weighting option found in the literature is to set the block weight equal to 

[(1/ ^Tb ) + (!/ 

^Ch , where and are the respective number of treatment and control 

students in the block (a form of precision weighting). The corresponding PATE parameter for the 
clustered, blocked SP design (Design 4) is Ejgg{Y-ji^(Y) — Yyi^(0)) . 


e. Reasons for adopting design-based rather than modei-based methods 

Education researchers typically use model-based, random effects approaches such as hierarchical 
linear model (HEM) methods (Raudenbush and Bryk 2002) to analyze RCT data from multilevel 
designs. We adopted a design-based framework for RCT-YES for several important reasons. Eirst, 
design-based methods do not require assumptions on the distributions of potential outcomes (only 
finite moment assumptions), whereas the model-based approaches often assume multilevel 
normality that must hold to produce consistent estimates. 

Second, design-based approaches produce closed-form expressions for the ATE estimators, unlike 
HEM methods that require iterative, numerical maximum likelihood procedures for estimation. 
Thus, the estimators under the design-based approach are more transparent and easier to understand 
(and to program into the computer) than the more typical approaches used in education research. 
Although some of the formulas presented in this report look complicated due to cumbersome 
notation, they are all based on simple means and cross-products of the data that can be calculated using 
statistical software packages in common use. 

Third, the model-based approaches are SP models that implicitly assume that the impact findings 
can be generalized to a vaguely defined super-population of study units. The design-based approach, 
however, allows the analyst to explicitly decide whether it is more realistic to assume internal validity 
(the EP model) or external validity (the SP model). Eourth, for clustered designs, data requirements 
are fewer for the design-based approach because the analysis can be conducted using data on cluster- 
level averages rather than individual-level data. Einally, unlike commonly-used model-based 
approaches, the Neyman-Rubin-Holland framework allows for heterogeneity of treatment effects, 
which leads to variance expressions that differ for the treatment and control groups, and that differ 
for the EP and SP models. 
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The main advantage of the model-based approach over the design-based approach is that it could 
yield more precise ATE estimates. However, this will only necessarily occur if the model is specified 
correctly. With misspecification, the model-based approaches could yield biased variance estimates. 
The design-based approach instead relies primarily on the randomization mechanism to develop 
consistent estimators that do not rely on parametric model assumptions regarding the structure of 
model error terms and their distributions. Thus, the design-based approach emphasizes robust 
inference and is less concerned with maximizing precision, although simulation findings from 
Chapter 9 suggest that precision losses are likely to be small using the design-based estimators. 

It is useful to briefly compare the design-based and HEM approaches more formally. Consider a 
standard simple differences-in-means estimator for the clustered design for the SP model with school- 
level randomization: 


( 4 . 12 ) « 


J V. 

Z Z 

J:T,=l j:T,-=0 


clus.SP 


1 

z 

yT=i 


W; 


L 

z 

j-Tj=Q 


W; 


where trij. and are the number of treatment and control schools in the sample, respectively, 
and other terms are defined as above. The key difference between the model-based and design-based 
approaches is the choice of Wj . The HEM approach selects weights to maximize the precision of the 

impact estimates. These weights are Wj = Ujlnjcl , where cl is the between-school (Eevel 2) 

variance component and (7^ is the within-school (Eevel 1) variance component of the error terms 
in the HEM model. To apply this method, it is necessary to obtain consistent estimates of cr„ and 
CTg , which requires the correct specification for the model error terms and their distributions. In 
contrast, the weights for the design-based approach reflect known (or assumed) study selection 
probabilities from study super-populations, and thus, are proportional to cluster-level population 
counts, which do not rely on a model. 


f. The use of robust standard errors as an alternative 

It is common in the analysis of RCT data to use standard errors from OLS models that are robust 
to model misspecification, and thus, that could accommodate the implied error structure of the 
RCT design. These estimators include robust, heteroscedasticity-consistent standard errors for non- 
clustered designs (Huber, 1967 and White, 1980) and extensions to clustered designs (Eiang and 
Zeger, 1986). These estimators are commonly referred to as HW standard error estimators. There is 
a growing literature on the statistical properties of these estimators, including their small-sample 
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weaknesses and ways to compensate for them (see, for example, Angrist and Pischke, 2009; Hausman 
and Palmer, 2011; Imbens and Kolesar, 2012; and Mackinnon, 2011). 

The HW estimators are popular in certain social science disciplines (such as economics) and share 
some common features as design-based estimators. However, we did not adopt the HW estimators 
for RCT'YES for several reasons (in addition to those discussed in the last section). First, there is 
some controversy about whether the HW estimators are supported by randomization. For example, 
Freedman (2008) argues that the HW estimators do not conform to the Neyman-Rubin-Holland 
model, whereas Lin (2013) proves that the HW estimators are asymptotically equivalent to the 
design-based FP estimators for the non-clustered design. Second, the attractive feature of the design- 
based approach is that the randomization mechanism defines the model error terms. Thus, variance 
estimators for the design-based approach are derived directly from this known error structure. In 
contrast, the HW estimators provide robust variance estimates for error structures that are unknown. 
Finally, simulation findings presented in Chapter 9 and summarized in the next section suggest that 
the design-based variance estimator performs well, so there is little empirical justification for using 
the HW estimator. 

In sum, we adopt the design-based approach because it aligns directly with the theory underlying 
experiments. Similar to the HW estimators, our variance estimators are based on asymptotic results. 
Thus, an important future research area is to examine the extent to which the literature on the small- 
sample properties of the HW estimators and the associated bias-reducing adjustments are applicable 
to the full range of design-based variance estimators considered in this report. 

g. Summary of simulation analysis findings 

This section summarizes simulation results from Chapter 9 to examine the statistical performance 
of the design-based estimator and two other commonly used RCT estimators: (1) the HLM 
maximum likelihood estimator and (2) the HW estimator that we refer to as a robust cluster standard 
error (RCSE) “sandwich” estimator. The simulations are conducted for a clustered RCT design 
where small sample biases are likely to be more prevalent than for non-clustered designs. We assume 
that (1) schools are the unit of random assignment, (2) student test scores are the outcome of interest, 
and (3) ATEs are estimated using regression models that control for pretest scores to improve the 
precision of the estimates. Eor the simulations, we employ real-world model parameter assumptions 
and consider a range of distributions for the potential outcomes, including normal distributions 
(that conform with the HEM assumptions) as well as bimodal and mean-centered chi-squared 
distributions to allow for some skewness in the distributions. 

The simulation findings suggest that the design-based ATE estimator performs well for clustered 
education RCTs. Biases of the estimated ATEs are negligible if the sample contains at least 8 schools. 
Eurthermore, with a sample of at least 12 schools, the empirical standard errors produced by the 
design-based approach align with their true standard errors, and are comparable to those for the 
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HLM and RCSE estimators. Similar findings also pertain to the clustered, blocked design. These 
results suggest that the design-based approach— which is fully based on the random assignment 
mechanism and simple asymptotic variance approximations— is likely to perform well under a range 
of RCT settings. Note that these simulation results do not address statistical power. 

h. Design assumptions 

The design-based estimators in RCT-YES considered in this report all rely on several key assumptions. 
First, they rely on the stable unit treatment value assumption (SUTVA) (Rubin, 1986), which has 
two components: (1) the potential outcomes of a student depend only on that student’s treatment 
assignment and not on the treatment assignments of other students in the sample, and (2) a student 
offered a particular treatment cannot receive different forms of the treatment. SUTVA implies that 
there is a single value of each potential outcome associated with each treatment for each student. 

To describe the first SUTVA “no interference” condition more formally, we first define 
for the non-clustered design to be the potential outcome for a student given the random vector of 
treatment assignments, , for all students in the sample. Similarly, for the clustered design, let 
denote the potential outcome for a student in school j given the random vector of all 
school treatment assignments, . We can then state the first SUTVA condition as follows: 

Assumption 4.1: SUTVA (No interference); Under the non-clustered design, for any two random 
assignment vectors and ,ifT.=Tl for student i , then = lh(T,(^ius) • Similarly, under the 

clustered design, if Tj = Tj for school j , then . 

SUTVA allows us to express i^(T„e,„s) as r,(TA and as Yy{T.) . Importantly, for blocked 

designs, SUTVA pertains to each block separately. 

In the education context, the plausibility of SUTVA will likely depend on the nature of the 
intervention and the extent of interactions between students and educators assigned to different 
treatment conditions. For instance, SUTVA is likely to be plausible for clustered designs where 
schools in geographically dispersed areas are randomly assigned to a treatment or control condition, 
because there is likely to be little meaningful interaction between students and educators across 
schools. SUTVA, however, may be less plausible for RCTs where, for example, students are 
randomly assigned within schools, in which case the treatment status of one student could affect the 
outcomes of other students in the school due to peer effects. In these cases, SUTVA could also be 
violated if the nature of the treatment depends on the types of students assigned to the treatment 
group (for example, their academic ability). The second SUTVA condition could also be violated if 
there is considerable teacher turnover so that treatment group students receive different “versions” 
of the treatment over time. 


26 


4. Overview of design-based methods 


Without SUTVA, statistical inference for RCTs becomes more complex because the ATE 
parameters discussed above become functions of specific treatment assignment allocations and types 
of treatments offered to students. Hong and Raudenbush (2006) discuss statistical modeling 
methods for estimating ATEs to account for violations to SUTVA. 

The second assumption that underlies the considered designs defines random assignment in terms 
of the independence between treatment status and potential outcomes (see Imbens and Rubin, 
forthcoming, Chapter 3): 

Assumption 4.2: Randomization: 21 1 (f)(1), )^(0)) /or the non-clustered design, and T. JL 

(1)^(1), l)^.(O)) for the clustered design for all I and j , where the probability of treatment assignment for each 

student (cluster) is between 0 and l.These independence conditions hold conditional on all covariate values 
defined by pre-randomization characteristics. In addition, for the blocked design, the independence conditions 
hold within each block. 

The final assumption that we invoke specifies finite first and second moments for potential outcome 
distributions: 

Assumption 4.3: Finite first and second moments: To obtain expected values and variances for the 
considered estimators, we assume ii(f)^.^(l)) < oo , < oo , 0 < Tar(l)^.^ (1)) < oo , and 

0 < Var{Y.ji,{0y) < oo for all considered potential outcome distributions. 

i. Brief summary of the considered estimators 

Chapters 5 to 8 of this report provide a detailed discussion of design-based estimators for Designs 1 
to 4 for the EP and SP models. The impact estimators for all designs are based on simple differences 
in mean outcomes between the treatment and control groups or regression models that estimate 
ATEs that adjust for baseline covariates using standard ordinary least squares (OLS) methods. The 
main difference between the impact estimators across the designs is the choice of weights for pooling 
estimates across blocks and/or clusters (if pertinent). In addition, all ATE estimators have 
asymptotically normal distributions which RCT-YES uses for hypothesis testing. 

The variance estimators, however, differ across designs and model specifications, and much of our 
discussion is focused on this topic. To help readers navigate the myriad variance estimators that we 
present, for reference. Table 7 displays equation numbers in the text for the variance estimators 
considered in this report for full sample and subgroup analyses. 
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Table 4. Equation numbers for variance estimators, by design and model specification 



Simple differences-in-means 
estimators 

Regression 

estimators 

Design and model 
specification 

Full sample^ 

Subgroups® 

Full sample® 

Subgroups® 

1. Non-clustered, non-blocked 

FP model (default) 

5.10; 5.48«'i 

5.33; 5.50".i 

5.26a; 5.49"’' 

5.38® 5.51".' 

SP model 

5.10®; 5.48" 

5.33®; 5.50" 

5.26a®; 5.49" 

5.38; 5.51" 

2. Non-clustered, blocked 

FP model 

BL0CK_FE=0 (default) 

6.4 and 6.5; Text 
on page 73" 

6.11 and 6.11a; 
Text on page 73" 

6.16; Text after 
6.16" 

6.19; Text after 
6.19" 

BLOCK_FE=l 

6.9; 6.14" 

6.13 

6.17; Text after 
6.17" 

6.20; Text after 
6.20" 

SP model I 

PATE (default for matched 
pair designs) 

6.25 

6.25 for subgroups 

6.28 

6.30 

BATE 

6.25 

6.25 for subgroups 

6.28 

6.30 

CATE (BLOCK_FE=0) 

Same as FP model® 

Same as FP model® 

Same as FP model® 

Same as FP model® 

CATE (BLOCK_FE=l) 

Same as FP model 

Same as FP model 

Same as FP model 

Same as FP model 

3. Clustered, non-blocked 

FP model (default) 

7.12; Text on 
page 94" 

7.16; Text on 
page 94" 

7.22 

7.24 

SP model I 

PATE 

7.30 

7.30 for subgroups 

7.32 

7.32 for subgroups 

BATE 

Same as FP model® 

Same as FP model® 

Same as FP model® 

Same as FP model® 

CATE 

Same as PATE 

Same as PATE 

Same as PATE 

Same as PATE 

4. Clustered, blocked 

FP model 

BL0CK_FE=0 (default) 

8.3 

8.3 for subgroups 

8.9 

8.12 

BLOCK_FE=l 

8.5 

8.7 

8.10 

8.13 

1 SP model I 

PATE (default for matched 
pair designs) 

8.16 

8.16 for subgroups 

6.28 using cluster 
averages 

6.30 using cluster 
averages 

BATE 

Same as PATE 

Same as PATE 

Same as PATE 

Same as PATE 

CATE (BLOCK_FE=0) 

Same as FP model® 

Same as FP model® 

Same as FP model® 

Same as FP model® 

CATE (BLOCK_FE=l) 

Same as FP model 

Same as FP model 

Same as FP model 

Same as FP model 


^ The subscript “e” denotes that the FP model heterogeneity term is excluded from the variance estimator, the subscript “i” denotes 
that the FP model heterogeneity term is included (subtracted) from the variance estimator, and the subscript “w” denotes Design 1 
and 2 variance estimators that incorporate optional weights. 
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5. Design 1: The non-clustered, non-blocked design 

This chapter discusses design-based methods for the simplest RCT design in RCT-YES (Design 1), 
where students are randomly assigned to a treatment or control group within a single population 
such as a school district or school. An example of this design is the Evaluation of the School Choice 
Scholarships Program (Mayer, Peterson, Myers, Tuttle, and Howell, 2002) where volunteer students 
in New York City were randomly assigned to a treatment group who received a school voucher of 
up to $1,400 per year to attend private schools or a control group who did not receive the voucher. 

This chapter discusses the methodological topics considered in this report in much more detail than 
in other chapters. We adopt this presentation to fix concepts and because methods for the simplest 
design lay the foundation for the analysis of more complex designs. For the discussion, we assume 
that the analysis is conducted using only those outcomes, subgroups, and covariates that pass the 
specification checks discussed in Chapter 3 to help minimize PII disclosure of sensitive information. 

a. Finite-population (FP) model without baseline covariates 

Using the notation from Chapter 4, we consider an RCT where n students from a single population 
are randomly assigned to either a single treatment or control condition. The sample contains 
rij- = np treatments and = «(1 - p) controls where p is the sampling rate to the treatment group 
(0 < p<V) . Under the FP model, it is assumed that the n students define the population universe. 
As before, let 1^(1) and FJ(0) be potential outcomes in the treatment and control conditions, 
respectively, that are assumed to be fixed for the study. The treatment status indicator variable is 
denoted by 21 . The ATE parameter for this FP design is pp = ._j (21(1) - 21(0)) / n ) . 

The data generating process for the observed outcome, , is 

(5.1) j,=21}^(l) + (l-21)}^(0). 

This simple relationship between the observed and potential outcomes is used to develop design- 
based estimators for pp . In this expression, is random only because T^ is random due to 

random assignment. Note that because treatment and control sample sizes are fixed, the T^ 
indicators are not independent across students. 

Consider the simple differences-in-means estimator for pp : 

Y np j n{\-p) 

(5-2) = -7,^ r,' ■ 
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To show that this estimator is unbiased, we use (5.1) to re-write pp as follows: 

Because T- is independent of the potential outcomes due to random assignment (Assumption 4.2), 
the expectation of 0„^i^, pp is 

(5.3) ^Z£,((i-t))t(o) 

^ !=1 

where Ep denotes the expectation taken with respect to the randomization distribution (R ) , 
keeping fixed the potential outcomes. The second equality holds because Ep{T-) = P{T. = V) = p and 
Ep{\-T,) = {\-p). 

Consider next an ordinary least squares (OLS) regression model for (5.1) that yields the simple 
differences-in-means estimator, but simplifies some of the proofs presented in this report for other 
designs (especially those for models that include baseline covariates). Following Freedman (2006), 
Schochet (2010), and Yang and Tsiatis (2001), we construct a regression model implied by the 
Neyman-Rubin-Holland model by re-writing (5.1) as follows: 

(5 -4) T = A + Pncius,FP (Ti -p) + Ui , where 
p,=pm+{\-p)Y{0), 

u, = UY,{\) - 7(1)) + (1 - TXYm - 7(0)). 

Note that using {T. - p) rather than T. does not change the OLS estimate of fp > simplifies 
the proofs. 

In what follows, it is useful to instead express the model “error” term, M, , as follows: 

(5 .4a) U^ = a I + {T^-p), where 


30 



5. Design 1: Non-clustered, non-blocked 


=P{yr ( 1 ) - + (1 - P)iX. (0) - F(0)), 

r,=(}^(l)-7(l))-(}^(0)-7(0)). 

In this formulation, M, is a function of two terms: (1) Ct, , the mean-centered expected observed 
outcome for the student; and (2) T^ , the mean-centered student-level treatment effect. 

The model in (5.4) and (5.4a) is unusual because it does not satisfy key assumptions of the OLS 
model. Specifically, M,- does not have mean zero, and, to the extent that varies across subjects, M, 
is heteroscedastic, (weakly) correlated across subjects, and correlated with the regressor {T^- p)-. 

Er (m . ) = a. , VarR (m . ) = T^p{\ - p), CoVr (upj ) = - p) !{n - 1), 

ER[{Ti-p>i'\ = ^iP{^-p)- 

To derive the OLS estimator for the regression model, define the 1x2 vector of explanatory variables 
for each student as Zi =[1 71] where T. =T.-p. The OLS estimator for the parameter vector 

(A Pncius,Fp)' is then matrix is block diagonal 

because = 0 • Note also that i: P = fipO - p) ■ Tluis, 


(5.5) /?, 


nclus.FP 


--n = (Tr-Tc). 


np(l-p) 


np(l-p) 


which is the simple differ ences-in-means estimator. Thus, the OLS and simple differences-in-means 
estimators are equivalent and have the same statistical properties. 

We now state a well-known lemma regarding the statistical properties of The proof is 

provided in Appendix A (Imbens and Rubin, forthcoming provides references for alternative 
proofs). We provide a proof of the lemma because it forms the basis for the proofs for other new 
estimators considered in this report, and allows us to develop all estimators using a common 
mathematical framework. We follow this approach for the remainder of the report. 

Lemma 5.1. Let pp = iy^ — Lc) simple differences-in-means estimator or, equivalently, the OLS 

estimator for the ATE parameter pp ■ Then, pp is unbiased with variance: 

(5 .6) VarR = ^ + ^ , tvhere 

np n{\ — p) n 
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S^= — 
n- 


Ifef n-lfef 


n - 


tZ = -4Z (K (*) - - K (0) - i'(O)])" • 

^ i=\ ^ ^ i=\ 


and where the variances are taken with respect to the randomization distribution. Furthermore, as H increases 
to infinity for an increasing sequence of finite populations, assume that: 

(5.7) 


where and are fixed, nonnegative, real numbers. Then, pp is asymptotically normal with 

asymptotic variance: 


(5.8) AsyVarM.pp)^^ + - 


s: 


np n(\- p) n 


The Sp and terms in (5.5) and (5.7) pertain to the extent to which potential outcomes vary across 
students. The term pertains to the extent to which treatment effects vary across students. Note 
that if student-level treatment effects are constant, =0 and Sp = . 


Unbiased estimates for Sp, Sp , S^, and can be obtained using sample variances for the 
treatment and control groups, Sp and s^ , respectively: 


(5.9) 


1 np 

4 = — 


■yrf 


1 


n(l-p) 


and si - — Y, (T- 

Ji(\ V) 1 


-ycf 


The variance terms pertaining to the individual treatment effects, and are not identifiable 
because it is not possible to observe an individual in both the treatment and control conditions. ' 
Note however, that > (Sp -S^Y . Thus, RCT-YES uses the following upper bound estimator for 
the variance expressions in (5.5) and (5.7): 


* Schochet (2009) discusses several approaches using baseline covariates for estimating and using 

where T. is an estimate of the treatment effect for student i. These methods include propensity score matching where treatments are 

matched to controls, estimating a regression model with a large number of subgroup-by-treatment status interaction terms, and 
assuming that the intervention does not change the rank ordering of the outcome measures within each research condition. 
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(5.10) 


nclus.FP 


.si i 

) + — 77- 

np n(l 


c 


{Sp Sp.) 


■p) 


n 


In the remainder of this document, we refer to the final term in (5.10) as the “FP heterogeneity 
term.” 


In the calculations, RCT-YES estimates the treatment group sampling rate using p = n.j. +Hp) . 
The estimation of p is discussed more fully in Section 5h where we discuss the treatment of missing 
data. 

Note that Lemma 5.1 applies to both continuous and binary outcomes. RCT-YES does not estimate 
logit (or probit) models for several reasons. First, randomization does not support the use of these 
nonlinear models (Freedman, 2008). Second, the treatment effect parameter differs for logit models 
with and without covariates, and models with covariates tend to reduce precision (see, for example, 
Schochet, 2013). Finally, estimated treatment effects in log odds units are difficult to interpret for 
some outcomes in social policy research, and translating the estimated logit parameters into more 
interpretable impacts on proportions (rates) complicates variance estimation, especially for clustered 
designs and models with covariates. 


b. Super-population (SP) model without baseline covariates 

The FP model can be extended to the SP model (see, for example, Imbens and Rubin, forthcoming; 
Schochet, 2010; and Yang and Tsiatis 2001). Under this approach, the n subjects are assumed to be 
a random sample from a student super-population, which, for simplicity, we hereafter assume is 
infinite (so that finite population corrections do not apply). ^ It is assumed that the potential 
outcomes, Fj(l) and 1)(0) , for the n study students are random draws from potential treatment and 

control outcome distributions in the super-population, with finite means and variances. These two 
outcome distributions also define the distribution of subject-level treatment effects in the super- 
population. As before, the sample contains rij. = np treatments and = n(l - p) controls where 
p is the sampling rate to the treatment group (0 <p<l). 

The ATE parameter for the non-clustered, non-blocked, SP model is gp = Ej (Lj (1) — Lj (0)) , 

where Ej signifies the expected value with respect to the simple random sampling of individuals 
from the student super-population (/) . 


^ In designs where the study sample is actually randomly selected from the super-population, finite population corrections apply and 
depend on sampling rates from the super-population. In this setting, our SP variance estimators are conservative. 
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Consider the simple differences-in-means estimator from above, P„cius sp ~iyr ~ yc) ' Following 
Imbens and Rubin (2015), we can show that this estimator is unbiased by using the law of iterated 
expectations. First, we calculate the expectation of '^ith respect to the randomization 

distribution, conditional on the fl students who are selected for the study and their fixed potential 
outcomes (denoted by the vectors Y(l) = (1^(1), h^(l),...,}^(l)) and Y(0) = (1^(0), 72(0)vjF'„( 0)) ). 
Second, we average over random draws of n students from / . Using this approach, we find that: 

(5.11) £„ = E, 1 Y(I), Y(0). «)) = ~t^^ (td) “ t(0)) = . 

where the second equality holds using (5.3) for the FP model. This proves that is unbiased. 

We can use a similar conditioning approach to calculate the variance of using the law of 

total variance where, to simplify notation, we do not display the conditioning set (Y(l), Y(0),n) : 


(5.12) Varj^ iPnclus,Sp) ~ iPnclus,Sp)) i^R i^nclus,Sp)) • 


Using variance results for the FP model in (5.6), we have that 


CTt 


(5.13) Ej (Earj^ i^ncius,sp )) “ Ji 


cr, 


Cl 


(7 


tI 


np «(1 - p) n ' 


where <Tpj=E^(Sp) is the variance of F^(l), (J^,=Ej{S^) is the variance of 1)(0) , and 
(jIj = Ej(S^) is the variance of student-level treatment effects in the super-population. 

Similarly, because the differences-in-means estimator is unbiased for the FP model, we have that 


(5.14) 


Vapim-UO)) 


tI 


n 


n 


Thus, collecting terms in (5.13) and (5.14), we find that 


(7r 


(5.15) = 


cr, 


ci 


np ni\-p)' 


This is the standard variance formula that is typically used in education research for RCTs using 
simple differences-in-means estimators, except that different variances apply for the treatment and 
control groups. Note that this variance does not contain the heterogeneity term In that reduces 
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the variance formula for the FP model. Thus, in principle, variances are larger under the SP model 
than FP model, because the SP parameter pertains to a broader population, with an associated loss 
in statistical precision. The variance in (5.15) can be estimated using (5.10), excluding the final FP 
heterogeneity term. 

As with the FP model, we can obtain equivalent results for the SP model using an OLS regression 
model. Let //jy =£; (F)(1)) and =Ej{Y-{0)) denote potential outcome means in the super- 

population, and let (7^j = E j (Y.(Y) - and cr^ = E j (Y.(0) - denote super-population 

variances. We can then construct a regression model by re-writing (5.1) as follows: 

(5.16) =JJq+ (Ti-p) + Oi , where 

Po ~ PPti + (1 “ P)Pci ’ 

f^nonclus, sp~(Pti Pci)’ 

0i=an+Tii(T,-p), 

(^ii =P(W)-Pti) + (^-P)(W)- Pa )’ 

( 1 ) “ Pti )~(Y ( 0 ) ~ Pa )• 

This regression model satisfies the usual OLS assumptions except that error variances differ across 
the two research groups. In order to see this, note that similar to the usual OLS model, the model 
error term, 0 ^ , has mean zero and is uncorrelated with (T-- p)-. 

(5.17) Ej^{0y = Ej^{aij) + Ej^j [r {T,-p)\T,=l]p + Ej^j [r,y ( 7 ) - ;?) 1 7 ) = 0 ] (1 - ;?) = 0 , 

Ejum - PWi] = - p)0, 1 7 ). = \]p + E^[{T, - p)0, 1 71 = 0](1 -p) = Q. 

Furthermore, the variance of 6^ differs for the treatment and control groups and is uncorrelated 
across individuals: 


(5.18) Varj^j \T^=\) = EJ^ [a^J + T^J (T). - p)f \T^=\ 


= C7 


TI’ 


Earj^i(^i\Ti=0) = c7}j, 

Cov^,{0,A’) = Ej,jm') = 0. 


Similar to the FP model, the OLS estimator for the SP model is ~ (W ~Yc) • 

a lemma regarding the statistical properties of ^he SP model. The proof using the 

regression approach is provided in Appendix A and follows Schochet (2010). 
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Lemma 5.2. Let P„ciusSP simple differences-in-means or OLS regression estimator for 

Pncius sp ttnder the SP model in (5.16). Then, is unbiased and asymptotically normal with variance: 

(5.19) FarM,.s,sp)=^ + ^^- 

np n{\-p) 

Unbiased estimates for <Jjj and cTq can be obtained using Sj and defined in (5.9) above. 

c. Hypothesis testing 

The estimators for the FP and SP models are asymptotically normally distributed and their variances 
can be approximated by chi-squared distributions. Thus, RCT-YES uses t-statistics and associated t- 
distributions to test the null hypothesis of a zero average treatment effect against the alternative that 
it differs from zero. The null hypothesis is that intervention effects average to zero, but could differ 
across sample members (that is, they could be positive for some students and negative for others). 

To test Hq : = 0 versus pp ^ 0 for the FP model, RCT-YES uses the following test 

statistic: 

Pncius, FP Tr Tc 

^Var^iPncius,Fp) + (4 / «c) “ ~Scf In)' 

and similarly for the SP model. RCT-YES applies a two-tailed test for hypothesis testing to be agnostic 
about whether the intervention will improve all considered outcomes. The program uses a 5 percent 
significance level (a — .05) by default, but it can be changed using the ALPHA_LEVEL input 

variable. For simplicity, the t-tests are conducted using {ftp -2) degrees of freedom.^ RCT-YES 
reports p-values from hypothesis tests for each outcome that is input into the program as well as 
estimated standard errors. The program does not report confidence intervals, but we urge program 
users to examine them to help interpret evaluation findings. Confidence intervals around the 



^ We considered using the Satterthwaite (1952) degrees of freedom approximation for two-sample t-tests with unequal population 
variances, but these approximations become complex for regression estimators with baseline covariates. We also decided not to use 
the Bell and McCaffrey (2002) degrees of freedom adjustment developed for robust HW standard errors, because they may not apply 
to all our considered design-based estimators. Furthermore, Imbens and Kolesar (2012) show that these adjustments only improve 
inferences in small samples for unbalanced RCTs where sampling rates differ markedly for the treatment and control groups, which 
is rare for social policy RCTs. These adjustments may be employed in future versions of RCT-YES after more research is conducted 
to assess their performance for the full range of designs considered in this report. 
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estimated ATEs can be calculated by multiplying the standard errors by T ^ (1 - (or / 2} ) , where T ’ 
is the inverse of the t distribution function with (Hj. + — 2) degrees of freedom. 

RCT-YES does not conduct randomization tests (also known as permutation or Fisher exact tests) to 
test the sharp null hypothesis of no intervention effects for an'^ individual: //g = for 

i — . Under this approach, the exact distribution of the test statistic under the null 

hypothesis can be obtained by calculating the test statistic for each possible permutation of 
individuals to the treatment and control groups and locating the observed test statistic in this 
distribution to calculate p-values. The test statistics can include a wide range of statistics measuring 
treatment-control differences in outcome values, such as differences in means, the natural logarithm 
of means (or other variable transformations), medians or other quantiles, or mean ranks. This 
approach has the advantage that it does not rely on asymptotic theory for hypothesis testing. 
Furthermore, statistical inference for this approach may have greater robustness and statistical power 
under various alternative hypotheses in the presence of outliers in the outcome data and if treatment 
effects are not additive. The current version of RCT-YES does not adopt this approach because of 
current lES standards for significance testing under RCT designs. However, this approach may be 
available in future versions of RCT-YES. 

d. Multiple comparisons adjustments 

In RCTs, researchers often conduct multiple hypothesis tests to address key impact evaluation 
questions. In such instances, separate t-tests for each contrast are often performed to test the null 
hypothesis of no impacts, where the Type I error rate (statistical significance level) is typically set at 
a = 5 percent for each test. This means that, for each test, the chance of erroneously finding a 
statistically significant impact is 5 percent. However, when the hypothesis tests are considered 
together, the “combined” Type I error rate could be considerably larger than 5 percent. For example, 
if all null hypotheses are true, the chance of finding at least one spurious impact is 23 percent if 5 
independent tests are conducted, and 64 percent for 20 tests. Thus, without accounting for the 
multiple comparisons being conducted, users of the study findings may draw unwarranted 
conclusions. 

The primary output from RCT-YES presents p-values from t-tests that do not correct for multiple 
comparisons. However, RCT-YES also denotes in the output whether statistically significant impact 
estimates remain statistically significant after applying the Benjamini and Hochberg (1995) multiple 
comparisons corrections procedure. These corrections are made for impact estimates for the full 
sample (that are typically the confirmatory analyses for education RCTs), but not for baseline 
subgroup analyses (that are typically exploratory analyses). The multiple comparisons corrections are 
made for all outcome measures within a group (domain) that are specified as inputs into RCT-YES, but 
not across outcome groups. 
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The Benjamini and Hochberg (1995) medaod controls the false discovery rate (FDR), which is the 
expected proportion of all rejected null hypotheses that are rejected erroneously. Stated differently, 
the FDR is the expected fraction of significant test statistics that are false discoveries. Benjamini and 
Hochberg showed that when conducting N tests, the following four-step procedure will control the 
FDR at the a level: 

• Conduct IV separate t-tests, each at the common significance level a . 

• Order the p-values of the N tests from smallest to largest, where < p^ 2 ) ^ P(n) 
the ordered p-values. 

• Define A: as the maximum / for which n , < —a ■ 

yu) ^ 

• Reject all null hypotheses = 1,2,..., A:. If no such k exists, then no hypotheses are 

rejected. 

This sequential procedure, which has become increasingly popular in the literature, is easy to use 
because it is based solely on p-values from the individual tests. In addition, it is used by IBS’s What 
Works Clearinghouse (WWC) to correct p-values for multiple testing in reviews of education 
research. Schochet (2009) discusses multiple comparisons issues in more detail. 

e. FP and SP models with baseline covariates 

In education RCTs, researchers often estimate impacts using regression models that control for 
covariates that pertain to the pre-randomization period. The use of baseline covariates can improve 
the precision of the estimated ATEs by explaining some of the variance of the outcome measures, 
and can control for observable baseline differences between the treatment and control groups due 
to random chance or missing data. The inclusion of baseline covariates to improve the precision of 
estimated ATEs is particularly important in clustered education RCTs where power is often a 
concern (see, for example, Schochet, 2008). The literature has shown that models that include pre- 
intervention measures of the outcomes as covariates (for example, achievement test scores in the 
period prior to random assignment) are especially strong predictors of student achievement 
outcomes in education research, and may be available in SEDS administrative records data. 

Covariates can be included in RCT-YES using the COVARIATES input variable (see Table 2 in 
Chapter 2). A separate list of covariates can be specified for each outcome group (domain) and 
each subgroup. 

This section extends the Neyman-Rubin-Holland framework to allow for the inclusion of baseline 
covariates in the regression models in (5.4) and (5.16) for the EP and SP models. Eor the analysis, 
we define Xj to be a Ixv vector of fixed baseline covariates. Because of randomization, Xj is not 
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indexed by treatment or control status. The V covariates could include pre-intervention measures of 
the outcomes and could be binary or continuous. We assume that students are randomly assigned 
independently of Xj , but blocked designs are considered later in this report. 

Importantly, the covariates, Xj , are irrelevant variables in (5.4) and (5.16), which define the true 
models under the Neyman-Rubin-Holland framework. Thus, the ATE parameters considered above 
for the FP and SP models without covariates pertain also to the models with covariates. This differs 
from typical OLS models where the true behavioral model is assumed to include the covariates. We 
do not need to assume that the true conditional distribution of JP, given Xj is linear in X; , as is 
the case with the usual OLS model. 

We consider models where Xj are included as explanatory variables, but do not add terms where 
the covariates are interacted with 71. The inclusion of Xj-by-Tl interaction terms will increase 
precision under the Neyman-Rubin-Holland model (see, for example, Lin, 2013 and Schochet, 
2010). However, we do not adopt this post-stratification approach for several reasons. First, in social 
policy RCTs, it is uncommon to find that interventions have a meaningful effect on the covariate- 
outcome relationship (see, for example. Table 5 in Schochet, 2010). Second, this approach involves 
a loss in the degrees of freedom, which could be substantial if the model contains many covariates. 
Finally, the inclusion of the interaction terms complicates the analysis, including the asymptotic 
results. Instead, for simplicity and transparency, we adopt a much more common analytic approach 
where all regression specifications include non-interacted baseline covariates only, which will likely 
capture most of the precision gains due to regression adjustment. 

In what follows, let Z| = (1 7) Xj) be a vector of model explanatory variables. The multiple regression 
(MR) estimator for both the FP and SP models can then be expressed as follows: 


(5.21) Pndus,MR,FP ~ Pnclus,MR,SP “ ^ ](2,2) • 

i=\ i=\ 


As discussed next, this ATE estimator is asymptotically unbiased (consistent), but unlike the 
estimators without covariates, it is biased in finite samples. 


Multiple regression estimator for the FP model 

To examine asymptotic moments under the FP model with fixed covariates, it simplifies the proofs 
to use centered covariates and Xj, where = (T* — x^) for student i and covariate k. Thus, 

we use Z| = (1 2) Xj) in (5.21) rather than ; this centering has no effect on the parameter estimates 
(apart from the intercept) and has no effect on model fitted values or residuals. Using these centered 
covariates, we assume in addition to (5.7) that as n approaches infinity: 
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(5.22) 


n 


a 


where OC^ and T^ are defined as in (5.4a); is an nxn symmetric, finite, positive definite matrix; 
and and are finite vxl vectors of fixed real numbers. In these expressions, the covariances 
between the covariates and potential outcomes can differ for treatments and controls. 

The following lemma uses results in Freedman (2006) and Schochet (2010). The proof is provided 
in Appendix A. 

Lemma 5.3. Let P„dus mrfp multiple regression estimator for P^dus fp assume (5.7) and (5.22). 

Then, is asymptotically normal with asymptotic mean pp and asymptotic variance: 


(5.23) AsyVarj^(P,,^i,^..j^p Pp) — 


- + 


\np 



n(\-p) 



n' n'n n' 

xg XX xa _ — 2 jt?) 

np(\ - p) np(\ - p) 


The first bracketed term on the right-hand side in (5.23) is the variance estimator under the FP 
model without covariates, so the second and third terms represent precision gains (or losses in rare 
cases) from adding covariates. 

As discussed in Schochet (2010), a conservative variance estimator for (5.23) is as follows: 


( 5 . 24 ) AsyVarjPP, 


nclus,MR,FP 


) = 


!' 2 

sf 


’c 


(Sj y^) 


23 


np n{\ - p) 


n 


"xa XX xa 

np(l-p) 


n' 

_ -^/i _ 9 ‘^^XT^^^xx^^^xa 

np(\-p) 


where the covariance matrixes are estimated using sample moments: 

(5.25) 


In (5.25), Hy and H^, are vx\ vectors of sample covariances between X; and y^ for treatments 
and controls, respectively: 

1 ^ ^ n{\-p) 

®t(^) = 7— and Hc(^) = - — X (% "Tc) • 

{n - 1)J9 i^=i (n - 1)(1 - p) 

This estimation approach becomes cumbersome for other, more complex designs considered later 
in this report that contain additional variance terms. Thus, RCT-YES instead estimates (5.24) using 
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a similar variance estimator to the following one suggested by Imbens and Rubin (2015) that is based 
on regression residuals: 


(5.26) 


AsyVarj,CPndus ,MR,Fp) 


E (Pi “A - kclus,MR,FP (Pi -P)~ Xi Y)' 

n{n-\-v)[p{\- p)f 


where Pq , pp , and 7 are parameter estimates from an OLS regression of y, on 

Zj = (1 T- X|). Equivalently, (5.26) can be calculated using the residuals from a regression model 
that includes the non-centered variables Zj = (1 Y1 Xj) . The rationale for this estimator will become 
apparent from Lemma 5.4 below for the SP model. 


For the calculations, RCT-YES uses a new, slightly modified version of (5.26) that includes the FP 
heterogeneity term, and that ensures when v = 0 , the variance estimator with covariates reduces to 
the variance estimator without covariates in (5.10). This estimator can be expressed as follows: 


(5.26a) 


AsyVarp,CPncius ,mr,fp^ ~ 


MSEp 

np 


+ 


MSEc 

niX-p) 


i^MSEp -^MSE^f ^ 

— , where 

n 


1 

= (n - v)p - 1 ~ ~ kclusMR^FP X, Y)^ 

= 7 — 777 — ^ Z -k+ kcius,MR,Fpp - Xi y)' 

(«-v)(l-J 9 )-l ,-^0 

are regression mean square errors for the treatment and control groups, respectively. For this 
estimator, losses in the number of degrees of freedom due to the inclusion of covariates in the model 
are split proportionately between the treatment and control groups. This estimator uses regression 
mean square errors for the FP heterogeneity term rather than the {Sj —SpY / n term to ensure that 
the variance estimators will be positive. 

RCT-YES conducts hypothesis tests using ntests with {rij +np-v-2) degrees of freedom, where V 
is the number of baseline covariates. 


Multiple regression estimator for the SP model 

The asymptotic moments for the multiple regression estimator for the SP model, 

calculated from the FP estimator using the same conditioning arguments as for the model without 
covariates. First, using the law of iterated expectations, we find that 
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PnciusMR,sp — = Similarly, using the law of total variance in (5.12), as « 

gets large, we have that (1) Var^{E^CPndusMR,sp)') = > n and (2) E,{Var^{P„ciusMR,sp)) equals the 

expectation of the variance expression in (5.23). The following lemma formalizes these results; the 
proof is provided in Appendix A and follows Schochet (2010). 

Lemma 5.4. Let P„cIusmrsp multiple regression estimator for PnciusSP~^E‘Ti~ E-ci) ’ Then, 

Pncius MR sp tisymptotically normal with asymptotic mean sp asymptotic variance: 


(5.27) AsyVar,^i0, 


ndus,MR,SP) 


<J 


TI 


+ - 


cr, 


a 


np n{\- p) I np(\- p) 


A' AlA. 


np\l-pf 


-2(l-2p)- 


XT XX xg 

np(l-p) 


where = X„^~ , and A,„ =E,(J]''^jX;Xi), = E,(^"^jX;a,.) , and A^^ = E,(J^''^jX;r,.) are moment 

matrices under the joint super-population distribution for the covariates and potential outcomes. 

RCT-YES estimates the variance in (5.27) using (5.26a) from above, excluding the FP heterogeneity 
term. This variance estimator is based on the squared expectation term after the first equality sign 

in (5.27). 

Even if users request regression-adjusted estimates, we urge users to also estimate impacts using 
simple differences-in-means methods; the two sets of estimates should be carefully compared and 
large differences should be resolved (for example, they could indicate data problems with the 
covariates). RCT-YES reports regression 7?^ values that are important for understanding the gains in 
precision due to the inclusion of baseline covariates and that can provide information on a key 
parameter for statistical power calculations for designing future education RCTs. 


f. Subgroup analysis 

In education RCTs, researchers often estimate ATEs for baseline subgroups defined by pre- 
intervention student, teacher, and school characteristics. Eor instance, researchers may be interested 
in assessing whether intervention effects differ by gender, test score levels in the prior year, educator 
experience, school size, and/ or school urban/ rural status. These analyses can be used to assess the 
extent to which treatment effects vary across policy-relevant subpopulations. Results from subgroup 
analyses can help inform decisions about how to best target specific interventions, and possibly to 
suggest ways to improve the design or implementation of the tested interventions. 

RCT-YES conducts subgroup analyses for catesorical subgroups where each sample member is 
allocated to a discrete, mutually exclusive category (for example, l=not proficient in math in the 
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prior year; 2=proficient in math; and 3=highly proficient in math). Subgroups for RCT-YES cannot 
be continuous variables, but users can re-define such variables (for example, prior year test scores) as 
categorical subgroup variables for analysis. RCT-YES will conduct subgroup analyses only if each 
specified subgroup category has a sufficient sample size to protect PII (see Chapter 3). If a subgroup 
category is too small, it can be grouped with another subgroup category or omitted from the analysis. 

In this section, we discuss design-based methods used in RCT-YES to estimate ATEs for baseline 
subgroups, including statistical tests for assessing differences across subgroup impacts. First, we 
discuss estimation methods using the FP and SP models without covariates (the simple differences- 
in-means estimators) and then using regression estimators with covariates. 

Subgroup FP and SP models without baseline covariates 

Using Assumption (4.2), random assignment ensures that T^ JL (1^(1), h)(0)) conditional on any 

covariate value defined by pre-randomization characteristics. Thus, we can use very similar design- 
based methods to those for the full sample to estimate ATEs separately for each subgroup of interest. 
The only difference is that the numbers of students in a subgroup who are randomized to the 
treatment and control groups are now random variables (with a hypergeometric distribution). For 
example, suppose that n — 100 , p - .5 , and that 30 percent of students in the sample are male. In 
this case, we would expect that 15 males would be randomly assigned to each research group. 
However, there is a variance around this gender allocation so that the observed treatment (control) 
group sample might contain fewer or greater than 15 males. We assume for subgroup analyses that 
there are at least 2 treatments and 2 controls in the subgroup sample to allow for variance estimation. 

We denote subgroups using the subscript “ g .” Thus, for example, is the number of students in 
the sample in subgroup g , and are respective treatment and control subgroup sample sizes, 
and similarly for the other variables defined above. Fet be a subgroup indicator variable that 
equals 1 if student i is in subgroup g and 0 otherwise. We can then define the ATE parameter for 
subgroup g for the FP model as follows: 

(5-28) = = - t (ii,(i)-y„(0)). 

ng 

where 1^(1) and Y^(0) are mean potential outcomes for the subgroup. 

Under the SP model, the potential outcomes for subgroup g are assumed to be random draws from 
subgroup-specific potential outcome distributions in the super-population with finite means jUji^ 
and jU^j^ and finite variances and cr^j ^ . Thus, the ATE parameter for the SP model is 
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( 5 . 29 ) Pnclus,g,SP ~ 

Consider the differences-in-means estimator for subgroup g for both the FP and SP models: 

% T,Y,^m Z (i-m,(o) 

(5.30) P„cius.g,FP ~ Pndus,g,SP “i^rg “Jcg) “ ' 

Z X (1-^) 

i’.Gjg =1 i:Gjg =1 

The denominators in this estimator are random, unlike the estimators for the full sample where the 
treatment and control sample sizes are fixed. Thus, (5.30) is a ratio estimator where both the 
numerator and denominator terms are random variables. As discussed below, this simple differences- 
in-means estimator is unbiased and asymptotically normal. 

To show that ^ pp is unbiased, we follow the approach used by Miratrix, Sekhon, and Yu (2013) 
in a different context by conditioning on the observed subgroup sample sizes, _T^ and 

_i(l“^) > then averaging over all subgroup allocations (A) to the treatment and 
control groups: 

(5.31) £gp(Ad,«.g.Pp) = £A[( I r.r.g(i)/»rg)-( Z 0-7;)i;g(0)/«cg)l«rg«cg] 

i:Gig=l i:G,g=l 

=£.[( z Jig(i)/«g)-( z A(oy'’g)i=Ad.s.g.pp- 

i:Gig=l ::G,.g=l 

This expression holds because Fp (T- \ ) = {rij^ / n^) and Ep ((1 - \ I n^) . 

Using a similar conditioning argument, we can calculate Var^p{P^^i^ ^ pp) using the law of total 
variance in (5.12). As n approaches infinity, Var^{Ep{P^^,^^^^pp)) = ^ ^p) = 0 . Thus, if we 

apply Lemma 5.1 for subgroup g conditional on subgroup sample sizes, we have that 

( 5 . 32 ) Var^p{p pp) = E^{Varp{p i pp)) = Sp E^( ) + E^{ ) — , 

where is the variance of treatment group potential outcomes 

for subgroup g , and and are defined analogously. 
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To estimate the unconditional variance in (5.32), an asymptotic expansion can be used to 
approximate using (\l E^{nj.^y) = (\l npq^) = (\l n^p) , where q^={n^/n) is the 

proportion of the total sample in subgroup g, and similarly for E^il/n^,^) which can be 
approximated using (1/ E^(n^g)) = (1/ n^il — p)) . However, as discussed in Efron and Hinkley 

(1978) and Ghosh, Reid and Fraser (2010), a more accurate variance estimator can be obtained by 
conditioning on the subgroup sample sizes. The rationale for this approach is that conditioning on 
the ancillary statistics and (that are uninformative about the ATE parameter) yields 

conditional variance estimators that are more germane to the observed data than the unconditional 
variance estimator. Thus, the variance estimator for the FP model used in RCT-YES is 


( 5 . 33 ) 




_ A I A 



, where 


f^Tg G:G,. =1,T;.=1 «Cg ^ i-.G,g=l,T,=0 


are sample variances for subgroup g and Pg^nTg/rig is the observed proportion of treatments in 
subgroup g . This variance estimator— which is based on actual subgroup sample sizes, not expected 
ones— has the same form as the variance estimator for the full sample. RCT-YES uses the same 
variance estimator for the SP model, excluding the FP heterogeneity term. 

RCT-YES conducts hypothesis testing for subgroup analyses using t-tests with {rij.g + - 2) degrees 

of freedom. The Benjamini and Hochberg multiple comparisons adjustments are not applied for 
subgroup analyses which are typically exploratory analyses. 


Testing for ATE differences across subgroups 

It is good research practice to downplay significant findings for individual subgroups if there is no 
statistically significant evidence of a difference in subgroup estimates (see, for example. Bloom and 
Michalopoulos, 2013; Rothwell, 2005; and Schochet, 2009). For example, statistically significant 
findings for English language learner (FEE) students should not be emphasized if there is no 
evidence of a difference in effects between EFF and non-EFF students. The practice of examining 
differences in subgroup impacts is especially important if policymakers aim to use evaluation 
findings to target services to those who can most benefit from them. (It should be noted that 
detecting a statistically significant difference in an effect is difficult because of the lower statistical 
power of tests of differences.) 
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RCT-YES conducts a chi-squared test to test the null hypothesis of no differences in treatment effects 
across a subgroup category with S levels (for example, S = 2 for girls and boys or y = 3 for three 
categories of achievement test scores in the year prior to random assignment). Let I be a yxl vector 
of ATE estimates for a subgroup category with an associated estimated variance-covariance matrix 

. Note that is diagonal due to the independence of the subgroup ATE estimates, which 
occurs for the EP model because 


^A^R\-iPnclus,g,FP Pnclus,g,Fp)iPnclus,g' ,FP Pnclus,g\Fp) I ^TgPcgP'Tg' ^Cg'^^ 

~ ^A^I^Pnclus,g,FP~Pnclus,g,Fp) I ^TgP-Cg\^ nclus,g\FP~ Pnclus,g\Fp) I ^Tg’,^Cg'^ ~ 

The same argument holds for the SP model. Using results from above, "k is asymptotically normal 
with mean k and variance . Construct the (y — l)xy matrix R as the SXS identity matrix where 
the last column is replaced by a column of -Is and the last row is deleted. The chi-squared statistic 
to test for subgroup differences can then be calculated using 

(5.35) ChiSq-Subgroup - * (R^) • 

This statistic is distributed as X with (y - 1) degrees of freedom. 

RCT-YES displays p-values from the chi-squared tests for each subgroup category of interest, along 
with t-test results for each subgroup level. 

Subgroup FP and SP models with baseline covariates 

One approach for including baseline covariates in the subgroup impact analysis is to estimate 
separate regression models for each subgroup. RCT-YES does not adopt this approach, however, 
because for small subgroups, degrees of freedom losses could reduce statistical power and collinearity 
among the covariates could complicate the estimation. Instead, RCT-YES estimates a full-sample 
regression model where the explanatory variables include the baseline covariates as well as subgroup- 
by-treatment status interaction terms ( terms). 

To examine this regression approach using the Neyman-Rubin-Holland model (which to our 
knowledge has not been formally addressed in the literature), we can use the relation 

A = to parameterize the regression model under the Neyman-Rubin-Holland model 

as follows: 


(5.36) T; = ^PgGigfi + w, , 




g=i 
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where w. = ] i® the error term for the FP model and U- = Z‘ ,G«K + r,tl is 

the error term for the SP model. Note in this model that we include subgroup indicator and 
interaction terms for each subgroup level and exclude the 21 and intercept terms. The benefit of 
this model formulation is that is the subgroup ATE parameter for subgroup g (that is, 

= Pndus,g,FP for ths FP model and - P„cius,g,sp for the SP model). Furthermore, G-^T. is 

orthogonal to the other model explanatory variables, which simplifies the proofs for examining the 
statistical properties of the ATE estimators. 

We now consider the asymptotic moments of the multiple regression estimator for in (5.36), 
where the Xj covariates are included in the model with associated parameter vector J . For ease of 
presentation, we focus on the SP model; differences between results for the FP and SP models are 
similar to those from above. The following lemma is proved in Appendix A and is not found in 
the literature. 

Lemma 5.5. Let = Pncius,g,MR,sp multiple regression estimator for P„,i^,g^gp ^(MTig~Mcig) 

(5.36), where the model includes the baseline covariates Xj. Then, Pn^ius g mr sp asymptotically normal with 
asymptotic mean gp and asymptotic variance: 


(5.37) AsyFarj^(p„^.i^^g 

,MR,Sp) 

1 




f -2 


<J. 


Tig 


<y, 


+ - 


CIg 


npq„ n(l-p)q 


sj 


1 

2[A;„,B + (1-2;2)A;,^B] 

np(\-p)qg 

np(\-p)qg 


where B - =E,( ^ xlXj), A^^^ -Ej( ^ Xi®,), and A^^^ -Ej( ^ are moment matrices 

under the joint super-population distributions for the covariates and potential outcomes; % = % i 

K=^i(ZhKh)> K =Ei (^,_jX!g;;) are defined in Lemma 5.4; and q^-n^ln. 

One direct approach for obtaining a consistent estimator for the variance in (5.37) that we discussed 
for the full sample analysis is to estimate the separate pieces B , , A^^^^ , and A^^^ using sample 

moment matrices. However, due to the relatively large number of pieces, RCT-YES instead estimates 
(5.37) using residuals from the regression model in (5.36) where the treatment status indicator is 
centered using instead of 21 : 
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(5.38) 


AsyVavi^iPg) 


MSEr, MSEc, 

^ + where 


= 7 — ^ — 7 £ ^Pig “ “ Pb^ “ 4 “ \yy 


MSE^„ = 


1 


'^Cg 




In this expression, MSEj^ and MSE(j^ are regression mean square errors for the treatment and 
control groups, respectively, > and - {n^ / n) . The degrees of freedom losses due to 

the inclusion of model covariates are spread proportionally across the subgroups. RCT-YES uses the 
same variance expression for the FP model, except that it subtracts off the FP heterogeneity term 
QMSE,^-^MSE,^f/n^. 

RCT-YES conducts hypothesis testing for the subgroup analysis with covariates using ntests with 
(nj.g + - vq^ - 2) degrees of freedom. Note that the ATE estimators, , are independent across 

subgroups in large samples; thus, the chi-squared tests described above for the simple differences-in- 
means estimators can be used for the regression estimators to test the null hypothesis of no 
differences in treatment effects across subgroups. 


g. Assessing baseline equivalence 

To strengthen the credibility of RCT findings, it is good research practice to demonstrate baseline 
equivalence of the treatment and control groups using observable baseline data. The What Works 
Clearinghouse (WWC) often requires that RCTs demonstrate baseline equivalence for their analytic 
samples in order for the studies to meet WWC design standards with or without reservations (see 
the WWC Procedures and Standard Handbook, Version 3, 2014). 

To assess baseline equivalence, RCT-YES conducts t-tests for each baseline covariate specified in the 
BASE_EQUIV input variable. The analysis uses the full sample with non-missing data for the 
considered covariate and is conducted separately for each outcome measure. 

The t-tests are conducted assuming equal variances for the treatment and control groups because the 
baseline covariates are measured prior to random assignment. For baseline covariate k , RCT-YES 
calculates the following t-statistic for both the FP and SP models: 
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(5.39) =4/,k(— + — ) , where 4 =(%A--^a)> 


Hf 


_ i^T i^C . ^2 _ 1 

^Tk ~ 


Tlj + YIq — 2 


1 ^ 


„ _1 X ^Tk) ^Ck 1 X ^^ik ^Ck) ' 

"r ^r7;=l "C“^r7;=0 


The degrees of freedom for the t-tests is {Hj. +n^-2) . RCT-YES also conducts an F-test to test the 

hypothesis that covariate means are jointly similar. This test accounts for potential dependencies 
among the covariates and multiple testing issues. The joint test is based on the following chi-squared 
statistic: 

(5.40) ChiSq-BaseDiff = 6'Vg^6, 

where 8 is a vxl vector containing the 5^. estimates and 


(5.40a) y^(k,k’) = ^^^ \)sl{k,k') + {nc l).yc(^^^') [-J_^ J_j 


2/7, 7,^^ 


nj+ri(.—2 




is the estimated variance-covariance matrix where 


Y nj 2 

Sj (k,k)= - ^ {Xfj^ — Xjp- ){Xj/^, — Xj.^.) and S(. (k,k) = - ^ (x-^ — x^^j^ ){Xik' ~ 2C,^k') ■ 

2 

ChiSq - BaseDiff is distributed as X with V degrees of freedom. RCT-YES uses the more familiar 
Hotelling’s T-squared statistic for the joint test: 


(5 .40h) 5'Vg'5(n^ -l- - v - 1) lirij + “ 2)v , 


which is distributed as F{y, rij +n^—v — X) . The joint test is conducted (1) using the sample without 

missing values for any of the covariates and (2) only if the number of observations per covariate is 
smaller than the OBS_COV input variable (which has a default value of 5; see Section Ij below). 


h. Treatment of missing outcome data and the use of nonresponse weights 

RCT-YES requires an input data file with one record per student in the randomized sample (or per 
educator if the intervention targets educators and their outcomes). The input data file should 
include records with missing data, although there may be instances where the data file only contains 
records for those with nonmissing data. This might be the case, for example, if the administrative 
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records used to define the sample only contain test score data on third grade students who completed 
the test, but not for test noncompleters. 

By default, RCT-YES estimates ATEs for a particular outcome using only those observations that have non- 
missing values for that outcome (respondents). The program does not impute outcomes, for example using 
multiple imputation (Rubin, 1987; Shafer, 1997), but deletes cases with missing outcome data 
(nonrespondents). We adopted this approach for RCT-YES for several reasons. First, in a large 
simulation study. Puma, Olsen, Bell, and Prince (2009) found that case deletion performs reasonably 
well relative to other missing data adjustment methods for education RCTs that focus on test score 
outcomes. Second, case deletion is simple to apply and understand and limits user input for running 
the program. 

As discussed more formally in this section, case deletion will yield unbiased ATE estimates if the 
underlying factors generating the missing data are unrelated to the intervention effects. This would 
occur, for example, under the Missing Completely at Random (MCAR) assumption (Rubin, 1987) 
that the missing data mechanisms are random for both the treatment and control groups. 

RCT-YES can also accommodate optional weights that adjust for nonresponse, for example, 
constructed using propensity score methods and available baseline data (as discussed later in this 
section). This approach yields unbiased ATE estimates under the Missing at Random (MAR) 
assumption (Rubin, 1987) that, for each research group, missingness is random conditional on the 
observed baseline covariates. Separate sets of nonresponse weights can be used for different 
outcomes and subgroups. 

It is important that program users assess the credibility of the MCAR and MAR assumptions. For 
instance, users should (1) examine treatment-control differences in data response rates for students, 
classrooms, and schools; (2) compare the baseline characteristics of respondents in the treatment 
and control groups (using the nonresponse weights if pertinent); and (3) compare the baseline 
characteristics of respondents and nonrespondents in each research group to gauge the extent to 
which respondents appear to be representative of the full sample of respondents and 
nonrespondents. Users should also assess the robustness of the impact findings using the simple 
differences-in-means and regression estimators and using standard methods for imputing missing 
outcome data (see, for example. Puma et al, 2009) prior to inputting the data into RCT-YES. 

Next, we discuss design-based methods using case deletion and nonresponse weighting to adjust for 
missing data. Although we focus on weights to adjust for data nonresponse, similar methods can be 
used for weights to adjust for unequal sampling probabilities across study participants for other 
design-related reasons. 
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Case deletion 

To examine the case deletion approach under the Neyman-Rub in-Holland model, we begin by 
defining “potential” data item response indicators in the treatment and control conditions, 
where Rj{Tj) = 1 for those with available data and 0 for those with missing data. Potential outcomes, 
Wnm)) can be redefined to be functions of both treatment assignments and data response 
(there are four such potential outcomes), but we do not use this notation for simplicity. 

To obtain unbiased ATE estimates using case deletion, we invoke the following assumptions in 
addition to Assumption 4.2 in Chapter 4 that defines randomization: 

Assumptions 5.1: Ignorability of data nonresponse; (i) R^{T.) = R. 1 71 and (ii) 7?. JL 
()^(l),}l(0))|71=f /or?G(0,l) , where 0 < F(R. = 1) < 1 . 

The first assumption is that data item response is independent of treatment status. The second 
assumption is that response is independent of potential outcomes (that is, response is “ignorable”). 
These assumptions together imply that response is random across the treatment and control groups. 
Note that Assumption 5.1(i) is not required for consistency of the ATE estimators using case 
deletion, because the intervention can influence data response rates as long as response is random 
within each research group. Eor simplicity, we adopt the more restrictive assumptions which seem 
more plausible. 

Using this framework, the observed outcomes for respondents can be expressed in terms of potential 
outcomes as follows: 


(5.41) y^=T^RM) + (i-m,m- 

The simple differences-in-means estimator using the full sample of respondents and non-respondents 
can then be expressed as follows: 

ZtR,y,(i) ta-T,m(o) 

(5.42) (yr-yc) = —. ■ 

i=l i=l 

The denominators in (5.42) are random variables because respondents are assumed to be randomly 
allocated to the treatment and control groups; thus, (5.42) is a ratio estimator. This situation is 
similar to the analysis of baseline subgroups from above. Thus, we can use the same methods as for 
the subgroup analysis to show that the estimator in (5.42) is unbiased by (1) conditioning on the 
sample sizes of respondents, rij^ and (2) averaging over the distribution of allocations (A) 
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of respondents to the treatment and control groups. Using this approach for the FP model, we 
find that 

(5.43) E^E^[{y^-yc)\ 

^trPcr\ 

^TR i=l ^CR !=1 

which shows that the simple differences-in-means estimator is unbiased. 

Using a similar conditioning argument, we can calculate “IPc) using the law of total 

variance to generate an unconditional variance similar to (5.32), which can be estimated using the 
conditional variance estimator in (5.10). A similar argument holds for the SP model. 


Using nonresponse weights 

Weights can be used in RCT-YES to adjust for data nonresponse or other design-related factors. In 
this section, we consider weighted estimators for the SP model; results for the FP model are very 
similar. Although we focus on weights to adjust for data nonresponse, similar methods can be used 
for weights to adjust for unequal sampling probabilities across study participants for other reasons. 

RCT-YES users can construct weights for each observation and include them in the input data file. 
A common method for constructing nonresponse weights is to use propensity score methods using 
detailed baseline data that are (1) available for both respondents and nonrespondents and (2) highly 
correlated with the outcomes (Rosenbaum and Rubin, 1983). An example of how to apply this 
method is as follows: (1) estimate a logit model where a 0/1 indicator of data response is regressed 
on baseline data; (2) calculate the predicted response probability (propensity score), p ^ , from the 
fitted logit model for each person; and (3) set the nonresponse weight for each person as w. = (1 / T, ) 
(or as the mean of for those assigned to the same propensity score class based on the size of their 
values). This approach is typically applied separately for the treatment and control groups, 
because patterns of nonresponse could differ across the two research groups. 

The considered SP impact parameter with weights can be defined by (1) calculating a separate ATE 
parameter conditional on each value of the weight and (2) averaging these ATE parameters over the 
distribution of the weights in the population (fV) : 


( 5 . 44 ) AcR.,sR=^i(m-m) 


^^[w^,(})(l)-})(0)|w,=w)] 

EjwiWi) 
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To develop ATE estimators for this SP parameter under the Neyman-Rubin-Holland model, we 
define “potential” weights in the treatment and control conditions as W,(71). Potential outcomes 

can now be expressed as a function of but we suppress this notation for simplicity. The 

weights are also likely to be a function of baseline covariates, so conditioning on the weights is 
synonymous with conditioning on specific values for the baseline characteristics. 

To obtain consistent estimators of the ATE parameter in (5.44), we invoke the following simplifying 
assumptions: 

Assumptions 5.2: Ignorability of nonresponse conditional on the nonresponse weights: 

h) = 1 71, (a) R.{T.) = R. JL T.\w.=w, and (Hi) 7?. Ji {Y.{\),Y.{Q))\w.=W,T.=t for all 

WeJV and t G (0, 1) , where 0 < P{W- = w) < 1 and 0 < P(Rf = 1 1 Vf, = Vf) < 1 . 

The first assumption implies that the weights are independent of treatment status, so that patterns 
of missing data (propensity score coefficients) do not differ for the treatment and control groups. 
The second assumption states that conditional on the weight (estimated propensity score), response 
rates are the same in the treatment and control groups. The third assumption states that given the 
weight, individuals are randomized to response status within each research condition. 

These assumptions jointly imply that conditional on the weight, individuals are randomized 
independently to both response status and treatment-control status. Because the nonresponse 
weights are functions of the baseline covariates, these assumptions imply that observable baseline 
covariates fully account for potential selection biases due to nonresponse. In essence, the weighting 
classes can be considered to be baseline subgroups. Note that Assumptions 5.2 are more restrictive 
than are necessary to establish consistency, which require only the third condition (that is, given the 
weight, individuals are randomly assigned to response status within each research group, but not 
necessarily across research groups). We instead invoke the more restrictive Assumptions 5.2 which 
simplifies the proofs and notation. 

The weighted simple-differences-in-means estimator is 


(5.45) Pnclus,SP,W ~ {Ytw ~ Yew) ~ 


i=\ 




W- 

I I 


?=1 






i=l 


This estimator is biased if the weights vary across the sample because the denominator terms are 
random variables. Under Assumptions 5.2, however, the estimator ^ is consistent because 


53 



5. Design Ij Non-clustered, non-blocked 


( 5 . 46 ) /?, 


ZEr,[o-t,)r,w,y,(o)] 

* p V i=\ __i^! 

nclus,SP,W n n 

1=1 1=1 
_ E^[w^Y^(])'\rp^ E^[w^Y^{Ql)'\r(\- p^) _ 


where Pj^ is the weighted treatment group sampling rate and r = P{R^ = 1) is the probability of 
response. 

RCT-YES adopts a large-sample approach to calculate the variance of w • following lemma 

is proved in Appendix A. 

Lemma 5.6. Let P„cius spw ~ iLiw ~ Lcfr) weighted simple differences-in-means estimator in (5.45) 

and invoke Assumptions (5.2). Then, jp is asymptotically normal with asymptotic mean sp ‘^^id 

asymptotic variance: 

( 5 . 47 ) 1 ], 

EiW np^ n(\-p^) 

where Cj^, —E^(w- (Y-{V) — Pjj) ) and —Ej(w- (41(0) — //^^) ). 


RCT-YES uses the following variance estimator for (5.47): 


(5.48) = ^^+ -2 

Wpnp^ Wcn(l-p^) 


(El ^wj H q) are average weights, and 

^Tw ~ [y~'....T._i (Ti ~Trir) /(Wj'— l)]and Sp.^ ~ (Ti ~ycw) /(^^^ ~ 1)] weighted 

sample variances. Note that if the weights for the treatment group sum to the fuU treatment group 
sample size of respondents and nonrespondents and similarly for the control group, we could instead 

estimate p^p using p^^ = (z: -j 'z: j W, ) . However, because it is unclear how RCT-YES users 

will scale the nonresponse weights for the treatment and control groups, RCT-YES does not adopt 
this approach and instead uses = p = {flf I {flf + flpj)) . RCT-YES uses the same estimation 
approach for the FP model with weights as for the SP model with weights, except that it subtracts 
from (5.48) the FP heterogeneity term, ((^j.^^ I w^) — ( 5 ^^ I W(.)Y I n . 
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Similar methods can be used to incorporate nonresponse weights into the regression estimators with 
covariates. For these models, let Zj = (1 be the centered model explanatory variables 

where Xj^ is a weighted version of the covariate vector Xj and = (T^ — p^) = (Tj — p) . We 
assume that data item response is independent of Xj conditional on the weights. The weighted 

regression estimator is > which is consistent 

and asymptotically normal (which can be shown using methods very similar to those used to prove 
Lemmas 5.4 and 5.6). RCT-YES estimates the asymptotic variance of P„^i„smrspw using model 
residuals from the fitted weighted regression model: 


(5.49) 


AsyVarj,^CPndus ,MR,SP,W^ 


where 

w^np w^n{l - p) 


MSE^^ = 




\ 2 

nclus,MR,SP,W ~ ) ’ 




nclus,MR,SP,W "'iW 


-XiwT)' ; 


and y^Q , Pnclus MR SPW y ^^d y are parameter estimates from a weighted regression of on Zj . The 
same estimator is used for the FP model, except that the FP heterogeneity term 
(i^MSE,^ / Wj. )-Qmse,^i I n Is subtracted from (5.49). This estimator has not been used 

in the literature. 


RCT-YES uses the same estimation approach for subgroup analyses with nonresponse weights. The 
SP model variance estimator for the simple differences-in-means estimator for subgroup g is 


(5.50) 


AsyVar{fi„^i^^^^sp,w) 


^TgW ^ ^CgW 

^TgngP, wj,n,(l-i?,) 


where 


«7g «Cg 

^TgW ~ ^ ^igiyig ~ ypgw) /(%g “ 1) and ^ ^/g(Tig “Tcgtr) ^(^Cg~^)- 

i-Gig=l,Ti=l i:Gig=l,Ti=0 

The FP model variance estimator is identical except that the FP heterogeneity term 
ii^Tgw ! ^Tg)~i^cgw ! subtracted from (5.50). Similarly, using weighted versions of 

(5.36) and Lemma 5.5, RCT-YES uses the following SP model variance estimator for weighted 
regression estimators for subgroup models with baseline covariates (and similarly for the FP model 
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where the FP heterogeneity term is subtracted from the SP variance estimator): 
(5.51) . . where 




yn v)Pgqg Li-q 


= (n-v)ix-p,)ci, -1 , -4 -,gwr)^ . 


Pg = {rijg I fig), and qg = {fig / n) . 


Finally, for the baseline equivalency analysis with weights, RCT-YES calculates the following weighted 
t'Statistic for both the SP and FP models: 


2 1 1 

(5.52) / Si^yg{ I ) , where 


flj 


fir 


^kw = i^Tkw~^ckw) > ^TWk and are weighted means. 


^2 _i^T /^7’) + (% ^M^Ckw/^c) ^2 _ ^ V _V ^2 j 

^kw - E TE: > ^Tkw~f,^ n Zj ^TWk' > and 


fij + Uq — 2 




4=1 


^CkW 


1 ”c 

= V W^(x. 


(^c 1) i4 =o 


X 


ik -^CWk 


Note that to calculate the pooled variance, the treatment and control variances are weighted by their 
sample sizes rather than the sum of their weights because it is unclear how RCT-YES users will scale 
the weights for each research group. 

To test hypotheses that covariate means are jointly similar, RCT-YES uses a weighted version of 
(5.40b) using the vector 6^ containing the <5).^ estimates and the following variance-covariance 
matrix: 


y,Ak,k')= 


(fir-l)[s^ffr(k,k')/w^] + (nc-l)[Scffr(k,k')/A]r I ^ 1 


fij^+n^-2 


[ — + — ] , where 


fir 


fl 


c 
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Sjw{k, k )= ^ ^ wf (x^ - and 

v% D vj.=\ 

1 «c 

Scw(k,k)= ^ ^ >v;- {^ik~^cWk^^hk'~^CWk'y 

v”c i:7; =0 

i. Treatment of missing covariate and subgroup data 

Missing covariate data. For RCTs, the use of regression models that control for baseline covariates 
can improve the precision of the ATE estimates. However, these covariates are not required for 
impact estimation, because simple differences-in-means methods produce unbiased (or consistent) 
ATE estimates. Thus, RCT-YES includes in the analysis individuals with valid outcome data but 
missing covariate data. 

Users can impute missing covariates themselves and input them into RCT-YES. If not, RCT-YES uses 
the following approach to adjust for missing covariates depending on the prevalence of missing data: 

• The covariate is missing for JO percent or fewer cases for both the treatment and control 
groups. In this case, the program imputes missing covariate values using covariate means for 
the sample with non-missing covariate values— separately for the treatment and control 
groups— and separately for specified blocks and clusters."^ The imputations are also conducted 
separately for each specified outcome measure (which may have different percentages and 
patterns of missing data). If pertinent, nonresponse weights are used for the imputations. 

• The covariate is missing for more than 30 percent of cases for either research group. In 

this case, the covariate is dropped from the analysis. 

The 30 percent missing data cutoff rule is consistent with results from the data nonresponse analysis 
conducted by lES’s What Works Clearinghouse (WWC) for assessing acceptable levels of biases in 
the impact estimates due to missing outcome data. The 30 percent cutoff can be changed using the 
MISSING_COV program option (see Table 2). 

Missing subgroup data. Eor subgroup analyses, RCT-YES excludes cases that have missing values for 
the subgroup variables. Eor example, if gender is missing for an individual, RCT-YES will exclude 
this observation from the analysis when estimating impacts for boys and girls (even if that 
observation has available outcome data). 


^ For Design 2, a missing covariate is imputed using the treatment or control group block-level mean if the covariate is missing for 30 
percent or fewer cases for both research groups in the block; otherwise the full sample treatment or control group mean is used. 


57 


5. Design 1: Non-clustered, non-blocked 


j. Identification of problem covariates 

RCT-YES conducts three types of analyses to identify potential data problems with covariates for the 
regression analyses. First, the program examines whether the ratio of the number of observations to 
the number of covariates is small, which can lead to model over-fitting. By default, RCT-YES uses the 
rule that there must be at least 5 observations per covariate for non-clustered designs and 5 clusters 
per covariate for clustered designs or the regression analysis is not performed. Thus, for example, in 
a clustered design, if the sample contains data on 20 schools (10 treatment and 10 control schools), 
the model can contain a maximum of 4 covariates in addition to the treatment status indicator and 
intercept terms. If the user specifies 6 covariates, the program will not estimate the regression models, 
but will alert the user to the problem. The cutoff rule of 5 can be changed using the OBS_COV 
option (see Table 2). RCT-YES checks these conditions for each outcome. The program uses the 
same cutoff rule for the baseline equivalency analysis to test the hypothesis that covariate means are 
jointly similar across the treatment and control groups. 

Second, the program examines whether there are large correlations among the covariates. RCT-YES 
estimates regression models regardless of covariate collinearity by using generalized inverses to invert 

matrixes such as to calculate the estimators. Nonetheless, the program calculates variance 

inflation factors (VIFs) to alert the user of collinear covariates so that some can be omitted from the 
analysis in subsequent runs of the program. To describe the calculation of the VIFs, it is convenient 
to standardize the covariates using 

^ikT = (% - ) / ylH7T,=i(^‘k-^rkf and = ( Vvt “ 

for the treatment and control groups, respectively, with associated row vectors Sj^ and . If we let 
the matrix of standardized covariates be and S(- = ST.»;cS,c ■ VIFs can be 

defined as VecDiagi^'j^j) ‘ and TecT)/ag(ScSj.)'' , The rows of these vectors equal 1/(1-R^^) 
where Rl is the value from a regression of covariate k on the other (^-1) covariates. RCT-YES 
reports these Rl values. If Rl values are large (for example, greater than .90), users might want to 
exclude the “duplicate” covariates from the analysis. 

Finally, RCT-YES calculates and prints out bivariate correlations between each outcome measure and 
each baseline covariate so that users can identify covariates with good predictive power. If the 
correlation between the outcome and a covariate is 1 or -1 (for example, because the outcome was 
mistakenly included as a covariate), RCT-YES excludes that covariate from the regression analysis. 
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k. Effect size calculations 

RCT-YES reports estimated impacts in both nominal units and effect size (standard deviation) units. 
It is becoming increasingly popular in educational research to standardize estimated impacts into 
effect size units to facilitate the comparison of impact findings across outcomes that are measured 
on different scales (Hedges, 1981, 2007). This approach has been widely used in meta-analyses to 
contrast and collate impact findings across a broad range of disciplines (Cohen, 1988; Lipsey and 
Wilson, 1993). Analyzing outcomes on a common scale also makes it possible to combine results 
across different grades and states within a particular study. The use of effect sizes is especially 
important for helping to understand impact findings on outcomes that are difficult to interpret 
when measured in nominal units (for example, impacts on behavioral scales or test scores). In 
addition, it has become standard practice in education evaluations to conduct power analyses using 
primary outcomes that are measured in effect size units. 

RCT-YES users will need to appropriately scale their outcome measures prior to analysis to fit their 
particular contexts. For example, for evaluations that are being conducted across grades or states that 
use different achievement tests, outcome data can be converted to z-scores separately for each grade 
and state by standardizing the measures using statewide means and standard deviations (the 
preferred method) or using sample means and standard deviations (see Mays et ah, 2009). 

Regardless of how the data are scaled, by default, RCT-YES calculates impacts in effect size units for 
each outcome by dividing the estimated impacts in nominal units by the sample standard deviation 
of the outcome measure across control students (the status quo condition). Mathematically, for Design 

l , RCT-YES calculates effect size impacts using fp pg — ^P„cius fp^ ) ’ where is the sample 

standard deviation for control group students (which, if germane, is calculated using weights).^ If 
desired, users can instead use the STD_OUTCOME option to input standard deviations for each 
outcome (for example, the value in the entire state or as reported by the test publisher from a 
norming sample). 

Importantly, for subgroup analyses, RCT-YES uses the same standard deviation as for the full sample 
to facilitate comparisons of impact findings across subgroups. The same approach is used for 
clustered and blocked designs (Designs 2, 3, and 4). For clustered designs where the input data are 
cluster-level averages (CLUSTER_DATA=0), RCT-YES will conduct the effect size calculations only 
if the STD_OUTCOME across control group students is specified. 


^ In the variance calculations, RCT-YES ignores the estimation error in . As discussed in Schochet and Chiang (2011), the 
incorporation of these variance components has very small effects on the overall variances in empirical applications. 
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For baseline equivalency analyses, RCT-YES uses the pooled standard deviation across both treatment 
and control group students to calculate effect sizes. We adopt this approach because the treatment 
and control variances of the baseline variables should be equal due to randomization. 

Importantly, consumers of RCT findings must use a broader set of criteria than the magnitude of 
the estimated effect sizes to gauge whether evaluation findings are meaningful and relevant for 
policy. Hill et al. (2008) and Lipsey et al. (2012) provide a framework for interpreting ATEs for 
education evaluations that could also be applied to RCTs in other social policy fields. For instance, 
in the educational context, they suggest that researchers examine study findings in terms of the 
natural growth in student achievement in a school year, policy-relevant performance gaps across 
student subgroups or schools, and observed effect sizes from previous similar evaluations. One could 
also adopt a benefit-cost standard to examine whether an intervention yields benefits in dollar terms 
(for example, higher future income) that exceeds intervention costs. 

I. The CAGE parameter 

The ATE estimator provides information on treatment effects for those in the study population who 
were offered intervention services. The treatment group sample used to estimate this parameter, 
however, might include not only students who received services but also those who did not. 
Similarly, the control group sample may include crossovers who received embargoed intervention 
services. In these cases, the ATE estimates may understate intervention effects for those who were 
eligible for and actually received services (assuming that the intervention improves outcomes). Thus, 
it is often of policy interest to estimate the causal average complier effect (CAGE) parameter that 
pertains to compliers— those who would receive intervention services as a treatment but not as a 
control (see, for example, Angrist, Imbens, and Rubin, 1996; Bloom, 1984; Heckman, Smith, and 
Taber, 1998; and Schochet and Chiang, 2011). 

If data are available on the take-up of intervention services by treatment and control group members, 
RCT-YES users can obtain estimates of the CAGE parameter by inputting names of up to two binary 
service receipt variables using the GOT_TREAT input variable. RCT-YES conducts the ATE and 
CAGE analyses using only observations with nonmissing data for both the service receipt and 
outcome variables (to ensure that the two sets of analyses are comparable). 

It is important to recognize that if treatment group noncompliers existed in the evaluation sites, they 
are likely to exist if the intervention were implemented more broadly. Thus, the ATE parameter 
pertains to real-world treatment effects. The CAGE parameter, however, is important for 
understanding the “pure” effects of the intervention for those who received a meaningful dose of 
intervention services, especially for efficacy studies that aim to assess whether the studied 
intervention can work. Decision makers may also be interested in the CAGE parameter if they 
believe that intervention implementation could be improved in their sites. Furthermore, the CAGE 
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parameter can be critical for drawing policy lessons from ATE effects; for instance, the CAGE 
parameter can distinguish whether a small ATE effect is due to low rates of compliance or due to 
small treatment effects among compliers. 

In this section, we consider identification and estimation of the CAGE parameter for the SP model; 
results for the EP model are very similar. Because the literature has conceptualized compliance 
decisions as dichotomous (Angrist et al. 1996), we model the receipt of services as a binary decision. 
Note that values for T. are not affected by compliance decisions. 

Identification 

In what follows, we introduce new notation. Eet D. = D.{T.) denote an indicator variable that 
equals 1 if student i would receive intervention services if assigned to a given treatment condition 
(71 = 0 or 71 = 1 ), and let Y.{T.,D.) denote the individual’s potential outcome for a given value of 

(T’, 79,. ); there are four such potential outcomes. 

To examine identification of the GAGE parameter, we classify individuals in the super-population 
into four mutually exclusive compliance categories: compliers, never-takers, always-takers, and defiers 
(Angrist et al. 1996). Compliers (CL) are those who would receive intervention services if and only if 
they were assigned to the treatment group [79, (1) = 1 and 79, (0) = 0 ]. Never-takers (N) are those who 

would never receive treatment services [79,.(1) = 0 and 79.(0) = 0], and always-takers (A) are those 
who would always receive treatment services [79.(1) = 1 and 79.(0) = 1 ]. Einally, defiers (D) are those 
who would receive treatment services only in the control condition [79,(1) = 0 and79,.(0) = 1 ]. 
Outcome data are assumed to be available for all sample members. 

The GAGE parameter for the SP parameter is P„ci^sspcl ~ ^/(^(I’l) ~ ^^(O’O)) • To examine the 
assumptions required to identify this parameter, we express the ATE parameter, as a 

weighted average of the ATE parameters for each of the four unobserved compliance groups: 

(5.53) 

,SP PclPiicIus,SP,CL PNPnclus,SP,N P aP nctus,SP,A P D P nclus ,SP ,D > 

where is the fraction of the study population in compliance group C = 1 ), and P„dusSPc 

is the associated ATE impact parameter where Pndus spn ~ (^ (^’ 0) ~ 71 (0, 0)) , 

Pncius,sp,A = (7) (1, 1) - 7) (0, 1)) , P„du.,sp,D = E, (1) (1, 0) - 1^ (0, 1)) , and P„a,,^sp,cL was defined above. 

hollowing Angrist et al. (1996), the GAGE parameter, P„dusSPCLy oan be identified under the 
following three key assumptions: 
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Assumption 1. Stable Unit Treatment Value Assumption (SUTVA): Potential compliance 
decisions Di{T^ and potential outcomes Y.(T.,D.) are unrelated to the treatment status of other 
individuals, and Y.(T.,D.) is unrelated to the service receipt status of other individuals. 

This version of SUTVA (which generalizes the simpler version in Chapter 4, Section f) allows us to 
express WM in terms of T. and D. rather than the vector of treatment and service receipt 
statuses of all individuals. 

Assumption 2. Monotonicitp. D.{Y) > D.{0) . 

Monotonicity means that D. values are at least as large in the treatment than control condition, and 
implies that there are no defiers (that is, = 0 ). Under this assumption, 

= P(D. (1) = 1) - P(D. (0) = 1) , which is the difference between service receipt rates in the 
treatment and control conditions. 

Assumption 3. Exclusion Restriction: Y^{\,r) = hl(0,r) for r = 0,1. 

The exclusion restriction means that the outcome for an individual that receives services would be 
the same in the treatment or control condition, and similarly for an individual who does not receive 
services. Stated differently, this restriction implies that any effect of on outcomes must occur only 

through an effect of T. on service receipt. This restriction implies that impacts on always-takers and 
neventakers are zero, that is, = Pndus,sp,A = 0- 

Assumption 4. Presence of compliers: PcL ^ 0 • 

Using Assumptions 1 to 3, the final three terms on the right-hand-side of (5.53) cancel. Thus, the 
following CAGE parameter can be identified from the data (assuming the presence of compliers): 

(5.54) 

Impact and variation estimation 

A consistent estimator for fo (5.54) can be obtained by dividing consistent estimators for 

Pnclus,SP PcL • 

(5.55) PcACE-Pndus ,SP,CL Pnclus,SP I P CL- 
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Because of monotonicity, - P{D.{\) -X)- P{D.{^) . Thus, estimators for can be 
obtained by noting that this parameter represents an impact on the rate of service receipt. Thus, 
estimation methods similar to those discussed above for gp can be used to estimate p^.p . For 

example, the simple differences-in'means estimator is P(jp —{dj-d^.), where dp = ,t/, / Hp , 

dc = (/, / Hp, , and df is an observed service receipt status indicator variable that equals 1 if 

student i received intervention services, and zero otherwise. Similarly, p^p can be estimated using 
regression models that include baseline covariates. RCT-YES estimates p^p using the same methods 
that program users specify for estimating P„cius,sp^ including the same baseline covariates for 
regression analyses. 

The CAGE estimator in (5.55) is an instrumental variables (IV) estimator where service receipt status 
(<i.) is used as an instrument for 71 in the ATE regression model (Angrist et al. 1996). It is also a 
ratio estimator, where both the numerator and denominator are measured with error (see Heckman 
et ah, 1994; Little et ah, 2008; and Schochet and Chiang, 2011). Accordingly, both sources of error 
should be taken into account in the variance calculations. 

A variance estimator for Pp^pp can be obtained using an asymptotic Taylor series expansion of Pp^pp 
around the true value Schochet and Chiang, 2011): 


( 5 . 56 ) [PcxE-PacE) 


nclus.SP ^ nclus.SP^ P nclus.SP ip CL P CL ) 


PcL 


2 

PcL 


Taking squared expectations on both sides of (5.56) and inserting estimators for unknown 
parameters yields the following variance estimator for Pp^pp : 


(5.57) Asy Var^, CPcace ) = 


dsyV(^fR[iPnclus,Sp) , PcACE^^y^^^RliPct) P CACEd^y^'^^ RliP nclus ,SP '' P Cl) 


'< 2 
PcL 


- + 


~ 2 
PcL 


~ 2 
PcL 


The first term in (5.57) is the variance of the CAGE estimator assuming that estimated service receipt 
rates are measured without error. The second and third terms are therefore correction terms. The 
second term accounts for the estimation error in Ppp and the third term accounts for the covariance 

between and Ppp . 


Asymptotic variance estimators in (5.57) can be obtained using the variance estimators presented 
earlier in this chapter for both the SP and FP models. A consistent estimator for 
AsyCov pp{P^^i^ sp,Ppi) is a follows (using the general form with nonresponse weights): 
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(5.58) 
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this expression, y. and d- are either variable means for the simple differences-in-means estimator 

or predicted values from fitted regression models with baseline covariates (where the same covariates 
are used for the service receipt and outcome variable regressions). 


Because and are asymptotically normal, ^cace also asymptotically normal. Thus, 

RCT-YES conducts hypothesis testing using the same approach as for the ATE estimators, including 
the same degrees of freedom. A similar approach for estimating the CAGE parameter is used for the 
EP model and for subgroup analyses. 


m. Reporting 

RCT-YES reports study findings in formatted tables. As discussed in detail in the RCT-YES Users 
Manual, RCT-YES first reports summary statistics on the specified outcomes, covariates, and 
subgroups before presenting the impact findings. The program outputs the following information: 

• Summary statistics on each specified outcome measure, separately for treatments and 
controls. The output includes sample sizes, missing data rates, and variable distributions 
(means, standard deviations, and 5th, 25th, 50th, 75th, and 95th percentiles) so that users 
can assess data quality and the presence of outliers. Summary statistics are also presented for 
the service receipt variables for the CAGE analysis (if specified) and weights. 

• Sample sizes and missing data rates for each requested subgroup. This information is 
presented separately by outcome measure and treatment-control status. 

• Information on the baseline covariates for the optional regression analysis. The output 
contains three types of information. Eirst, it indicates reasons that a covariate is excluded 
from the analysis (for example, because it has too many missing values). Second, the output 
indicates if a covariate is highly collinear with the others, in which case the user might 
consider omitting the covariate from the analysis to avoid needless losses in the degrees of 
freedom for hypothesis testing. Einally, the output displays bivariate correlations between 
the covariates and outcomes to help users identify covariates that can most improve the 
precision of the impact estimates 

• Results from the baseline equivalency analysis (if requested). Eor each baseline measure, 
RCT-YES displays treatment and control group means, the difference between the two 
means, the difference in effect size units, the standard error of the difference, and the p-value 
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of the difference with an attached symbol **’ indicating statistical significance at the 5 percent 
level (the default) or at another specified level. The p-value for the test that covariate means 
are jointly similar is shown in the header row for each outcome. 

• Results from the impact analysis for each requested outcome and subgroup. The results 
are reported using a similar format as for the baseline equivalency analysis. To report 
regression-adjusted impact estimates, RCT-YES presents the unadjusted control group mean and 
the adjusted treatment group mean calculated as the sum of the unadjusted control group mean 
and the regression-adjusted impact estimate. For full sample analyses, the output indicates 
that an impact estimate remains statistically significant after applying the BH correction 
using the symbol after the **’ symbol attached to the p-value. For subgroup analyses, the 
output presents impact findings for each subgroup, and presents the p-value for assessing 
impact differences across subgroups in the header row for each subgroup. Separate tables are 
produced for findings from the CAGE analysis (if specified). 
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6. Design 2: The non-clustered, blocked design 

This chapter discusses design-based methods for Design 2 where students are randomly assigned to 
a treatment or control group within blocks (strata). A common example of a blocked design is a 
multi-district RCT where randomization is conducted separately within school districts. Blocked 
designs also include longitudinal designs with multiple cohorts (for example, ninth graders in two 
consecutive years) where random assignment is conducted separately by cohort. Blocked designs also 
include two types of designs that are often used in education research: (1) matched paired designs 
where similar units are paired and random assignment is then conducted within each pair, and (2) 
designs where random assignment is conducted separately within demographic subgroups (for 
example, for girls and boys) to ensure treatment-control group balance for each subgroup. 

Blocked designs are common in education research. An example of a non-clustered, blocked design 
is the Evaluation of Charter School Impacts (Gleason, Clark, Tuttle, and Dwoyer, 2010) where the 
outcomes of students who applied and were admitted to charter schools through randomized 
admissions lotteries (lottery winners) were compared with the outcomes of lottery losers in each of 
36 school districts across 15 states. Blocking can improve the precision of the ATE estimators if the 
blocking is based on characteristics associated with the potential outcomes of interest. Blocking can 
also improve the generalizability of study findings because a “mini-experiment” is replicated across 
different blocks. 

Eor blocked designs, the input data file in RCT-YES must contain a variable that indicates the block 
for each student; otherwise the program will not perform the analysis. The true identities of the 
blocks can be masked. The name of the blocking variable must be specified using the BEOCK_ID 
input variable. If the design involves pairwise matching, the input variable MATCHED_PAIR 
should be set to 1 (see Table 2). 

RCT-YES uses the following rules for including blocks for the full sample and subgroup analyses: 

• Eor the EP models, blocks are included in the analysis only if (1) they contain at least 2 
treatment and 2 control students with available outcomes and (2) the outcomes vary across 
students in the block for at least one research group. 

• Eor the SP models and the EP models with the BEOCK_EE= 1 option, blocks are included 
if they contain at least 1 treatment and 1 control student with available outcome data. 

Differences between the ATE estimators for the EP and SP models are more pronounced for blocked 
designs than for non-blocked designs. Thus, in what follows, we first discuss simple differences-in- 
means and regression-adjusted estimators for the EP models, and then discuss these estimators for 
the SP model and designs with pairwise matching. The discussion focuses on ATE estimators and 


67 


6. Design 2: Non-clustered, blocked 


their standard errors; additional methodological topics discussed in detail in Chapter 5 are discussed 
only if they differ from those presented earlier. 

For the analysis, we use similar notation as in Chapter 5 with the addition of the subscript “b” to 
indicate blocks. Thus, for example, is the number of students in block b , ^^^(1) and are 

potential outcomes for student i in block b , Pi, is the block-specific treatment group sampling rate, 
T.j^ is the treatment status indicator variable, and so on. We assume that there are h blocks in the 
sample and define to be a block indicator variable that equals 1 if student i is in block b and 0 
for students in other blocks. 


a. FP model without baseline covariates 

In this section, we discuss simple differences-in-means estimators for the FP model, first for the full 
sample analysis and then for the subgroup analysis. 


Full sample analysis 

For the non-clustered, blocked design, the ATE parameter for block b is 


(6-1) Pndus ,b,FP 


h(i)-h(o) 







where f^(l) and 1^(0) are mean potential outcomes (that are assumed to be fixed for the study). 
The ATE parameter across all blocks can then be expressed as 


( 6 . 2 ) 6 


h 

1 

_ b=i 


^ ! ^hP nclus,b,FP 


nclus , blocked ,FP 


z 

b=i 


Wu 


which is a weighted average of block-specific ATEs with weights . The default weights in RCT-YES 

are w* = , so that blocks are weighted by their student sample sizes. Stated differently, each 

student is given equal weight in the analysis (that is, W,/, = 1 ). If weights are sites, another weighting 

scheme is to weight each block equally ( = 1 ; w,/, = (1/^^,) ). This approach yields the ATE 

parameter for a student in the average site in the study rather than for the average student in the 
study. This approach might be desirable in evaluations where the site size distribution is highly 
skewed, so that very large sites do not have an undue influence on the pooled impact estimates. This 
approach can be performed in RCT-YES by including a weight in the input data file that is set to 
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- (1 / ) for each student. RCT-YES allows users to specify different weighting variables for 

each outcome. 

In a blocked design, random assignment is conducted separately within each block, where sampling 
rates to the treatment group, Pi , , could differ across blocks. Thus, the ATE estimators for the FP 
model discussed in Chapter 5 apply to each block separately (see also Imbens and Rubin, 
forthcoming). Accordingly, an unbiased simple differences-in-means estimator for the ATE 

parameter in block b is fi„ciusbFP ~iyn “Ta)> unbiased estimator for the pooled ATE 

parameter in (6.2) is 

h 

Y.^biyn-ycb) 

b=\ 

h 

h=\ 


h 

„ / j nclus,b,FP 

(6-3) 

, blocked, FP h 

i^b 

b=l 


Because the samples across blocks are independent, the variance of the blocked ATE estimator is 

h 


(6.4) ' 


Yybyarj,{P^^^^^b,Fp) 




b=\ 


and an upper bound estimator for ^p) is 


(6.5) = 


’Cb 


nclus,b,FP ^ 


^bPb n^i^-Pb) 


('^Tb ^Cb^ 

n. 


where and 5^^ are block-specific sample variances. Furthermore, blocked fp i® asymptotically 

normal as the number of students per block goes to infinity because it is a weighted sum of 
independent, asymptotically normal random variables. Thus, t-tests for the pooled estimator can be 

used for hypothesis testing with(y^,^ Xn.jT, + - 2h) degrees of freedom (for block-specific 

estimates, the degrees of freedom for t-tests is ~2)). 


The simple differences-in-means estimator can also be obtained using a regression framework for the 
Neyman-Rub in-Holland model. This can be done by specifying a regression model for each block 

using (5.4) from Chapter 5 and then aggregating these models using the relation y. = 
where is a block indicator variable. This yields the following pooled regression model: 
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( 6 . 6 ) y, = Yp, 


h 
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/ 1 nclus ,b ,FP*^ ib^ ib 
b=\ 


b=\ 


+ U; 


where -{Tn,-Pf,) are centered treatment status indicators and U- = is the 

pooled error term based on (5.4a). Note that we include terms for all h sites in the model and 
exclude the intercept term. 

The OLS estimator for the ATE parameter in block b is P„dusbFP ~ (Tre ~Ta) ; this estimator is 

unbiased and asymptotically normal and its variance can be estimated using (6.5). The proof of this 
result is very similar to the proof for Lemma 5.5 for the subgroup analysis and is not repeated here; 
the main difference between the proofs is that student sample sizes are fixed for the blocked analysis 
but are random for the subgroup analysis. The pooled ATE estimator can then be obtained using 

(6.4). 

To help interpret results from a blocked RCT design, it is often helpful to examine the variation in 
estimated treatment effects across blocks. Eor instance, study findings could have different policy 
implications if the impact estimates are consistent across blocks than if they vary considerably across 
the mini-experiments. Eurthermore, examining block-specific impact estimates provides information 
on the extent to which different choices for the weights, , can lead to differences in the pooled 
findings. 

RCT-YES does not report ATE estimates for each block due to data disclosure concerns that could 
arise for small blocks. Instead, the program reports summary statistics on the block-specific impact 
estimates. In addition, the program conducts a joint chi-squared test to assess whether differences 
between the estimated block impacts are statistically significant (see below). 

RCT-YES uses an identical chi-squared statistic for assessing differences between the block impact 
estimates as the statistic in (5.35) for assessing differences between the subgroup impact estimates 
for Design 1. In the present context, the program calculates the chi-squared statistic 

(RX)'(RO,R) (RX) , where k is a hx\ vector of block-specific ATE estimates; is the associated 
estimated variance-covariance matrix, which is diagonal due to the independence of the block- 
specific estimates; and the {h — \)xh matrix R is identical to the corresponding matrix in (5.35). 

The chi-squared statistic is distributed as X with (h — 1) degrees of freedom. 

Einally, a common regression approach for blocked designs is to include as explanatory variables the 
treatment status indicator variable ( 71^ ) and the block indicators ( ) but not the treatment-by-block 
interaction terms. It is useful to parameterize this model as follows: 
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(6.7) /,=«,(?; + +e, 
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h=\ 


where is the error term. The OLS estimator for the impact parameter Ct^ in (6.7) is 

h h 

- Pb)yib Z ^bPb (1 - Pb \yrb - ycb ) 

b=\ i=l 6=1 

6 h 

6=1 i=l 6=1 

In general, this estimator is biased for the FP ATE parameter because it is a weighted average of 
block-specific impacts with weights Pb) ' It h informative to express these weights as 

((1/ rijfj) + (1/ , which can be interpreted as the inverse of block-specific variances of the simple 

differences-in-means estimators, where the variances of the outcome measures are 

assumed to be the same in each block (which is the typical assumption in OLS models). Thus, the 
use of these weights is a form of precision weighting. 

Using results from the regression analysis in Chapter 5 and Imbens and Rubin (Chapter 9, Theorem 
1), this estimator can be shown to be asymptotically normal with an asymptotic variance that can be 
estimated as follows: 


( 6 . 8 ) a,= 


(6.9) AsyVar^{a^) 


h nu 
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[^Pb(^-Pb)^bf 


6=1 


where qf, = («^ / n) is the proportion of the total sample that is in block b . For this estimator, the 
degrees of freedom for hypothesis testing is ^^ct)~^~^) ■ RCT-YES does not include the 

FP heterogeneity term in this expression. 

This estimation approach can be specified in RCT-YES using the BLCCK_FE=1 specification. The 
approach seems more appropriate for estimating a SP population ATE parameter to help maximize 
precision of the estimates than a FP population parameter that is concerned with treatment effects 
for the average student (or block) in the sample. Nonetheless, it has the advantage that it only 
requires 1 treatment and 1 control group member per block for variance estimation, and could be a 
parsimonious specification for designs with small blocks. Furthermore, degrees of freedom losses are 
smaller. For this specification, RCT-YES does not provide information on block-level impact 
estimates, because only the pooled impact estimate is calculated. 
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Subgroup analysis 

The same methods discussed above for estimating impacts for the full sample can be used to estimate 
impacts for baseline subgroups under the blocked design. If we denote subgroups using the subscript 
“ g the simple differenceS'ln-mean ATE estimator for a subgroup is 


h 

^ nrlus.p.h.FP ^ (T Tph y Cgb^ 
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The asymptotic variance estimator of this ATE estimator is 

Yygb^^y^^^PKci^,g,b,Fp) 
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2 2 

and are block-specific sample variances for subgroup g, Pgb —b^Tgb ^ , and 

- {n^fj / Hfj) . Note that the default block-level weights for the subgroup analysis ( ) can differ 

from the default weights for the full sample analysis ( ) if some subgroups (for example, English 

language learners) are concentrated in some sites. RCT-YES allows users to specify different weights 
for each subgroup. 

Hypothesis testing for the pooled estimator can be conducted using t-tests with 
(y^, ^ ) - 2h) degrees of freedom. Tests of subgroup differences in impacts can be 

conducted using the chi-squared statistic in (5.35), where the variance-covariance matrix, , is 
calculated using (6.11). 

If the BEOCK_EE=l option is used, RCT-YES estimates the following regression model using OES: 

(6. 12) ± +f^± + n , , 

g=l h=l g=l 
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where f.^ = (7] ~'^i,-i^‘bPgh) ’ Pgb the observed subgroup sampling rate to the treatment group, 

and rj. is the error term. This model controls for main block effects but excludes block-related 
interaction terms. In this specification, the ATE parameter for subgroup g is 
Pg - Pncius,g, blocked, MR, FP ■ RCT-YES appUes the following new asymptotic variance estimator for : 


h "gb 
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where Ug is the subgroup sample size, Pg^^ = / blgA , and — (jlg^ • P°r this estimator, 

the degrees of freedom for hypothesis testing is • 


Using nonresponse weights 

Weights to adjust for missing data (or other reasons) can be incorporated into the weight variable. 
RCT-YES uses the student-level weights, , to adjust for nonresponse within blocks and uses the 

block-level weights, = Z,-i^'* ’ nonresponse at the block level (if germane). RCT- 

YES conducts the within-block nonresponse adjustments using the methods from Chapter 5 for 
Design 1 that are applied to each block separately. 

If nonresponse weights are specified using the BLOCK_FE=l option, RCT-YES estimates (6.7) using 
weighted least squares. Under this specification, the ATE estimator is a weighted average of block- 

specific impact estimates (adjusted for nonresponse) with weights P/,fvG~ Phir)'^ili^ib > where 
Pbw ~ ('^jliPh^ib ■ RCT-YES uses the following variance estimator for this specification: 


b "h 
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where w* = (ZA^/* same approach is used for the subgroup analysis using the 

corresponding variance expression in (6.13). 
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Assessing baseline equivalence 

To assess baseline equivalence under the blocked design, RCT-YES conducts t-tests for each specified 
baseline variable using the same methods as discussed above for estimating ATEs on study outcomes 
except that it uses the pooled variance estimator in (5.39) or (5.52) for each block if BLOCK_FE=0. 
To test the hypothesis that covariate means are jointly similar, RCT-YES uses Hotelling’s T-squared 
statistic in (5.40b). The variance-covariance matrix for this joint test could be computed using 

^6, blocked EL where V,, is the estimated variance-covariance matrix among 
the baseline covariates in block b (see (5.40a)). RCT-YES, however, ignores the blocking and 
estimates Vg using the approach for Design 1; we adopted this approach because of the potential 

for small sample sizes in some blocks that could yield unstable estimates of if there are a sizeable 

number of baseline variables specified for the analysis. This same approach for the joint test is used 
for the BLOCK_FE= 1 option. 

b. FP model with baseline covariates 

To examine regression-adjusted estimators under the Neyman-Rubin-Holland model for the blocked 
design, we use similar notation as in Chapter 5 for Design 1 and define Xib to be a Ixv vector of 
fixed baseline covariates for student i in block b . One approach for conducting the analysis is to 
run separate regression models for each block. RCT-YES, however, does not use this approach 
because of potential estimation problems for small blocks. Instead, as discussed in this section, RCT- 
YES follows the approach used for the subgroup analysis for Design 1 by estimating full-sample 
regression models that include block-by-treatment status interaction terms. 

Full sample analysis 

To examine the regression approach with covariates for the full sample, we use the regression model 
in (6.6) where the explanatory variables include the centered covariates, Xib=(Xib-xJ, with 
associated parameter vector y. To be parallel with previous results, we discuss the asymptotic 
moments of the multiple regression estimator by focusing on an SB parameter, that is closely related 
to the FP parameter, where blocks are assumed to be fixed for the study, but where students within 
blocks are assumed to be randomly sampled from a broader population. The considered parameter 
is the cluster average treatment effect [CATE] parameter for block b : j cate ~ ~ ^i(O))- 

Lemma 6.1 presents asymptotic moments of the regression estimator for the CATE parameter for 
block b . The results are similar to those from Lemma 5.5 for the subgroup analysis for the non- 
blocked design (Design 1); the key differences are (1) samples sizes are fixed for the block analysis 
but not for the subgroup analysis and (2) treatment group sampling rates, Pi , , can differ across 
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blocks. Thus, for simplicity, we present the new results in Lemma 6.1 in less detail than for Lemma 
5.5 (by omitting the regularity conditions) and do not repeat the proof. 

Lemma 6.1. Let PncUsb mrcate multiple regression estimator for the CATE parameter for block 0 . 

Then, is asymptotically normal with asymptotic mean ^ asymptotic variance: 
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Based on this lemma, for Design 2, RCT-YES uses the following variance estimator for the FP ATE 
estimator = 
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The regression-adjusted block-specific impact estimates and variances can then be weighted to yield 
overall ATE estimates for the Design 2 FP parameter using the expressions in (6.3) and (6.4). RCT- 
YES conducts hypothesis testing for the pooled estimator using t-tests with 

degrees of freedom (for block-specific t-tests, the number of degrees of 
freedom is («j.^ -2)). Nonresponse weights can be incorporated into the analysis by 
estimating a weighted regression model using the weights and using similar variance estimators 
as in (5.51) for the subgroup analysis. 

If the BLOCK_FE=l option is specified, RCT-YES estimates the OLS model in (6.7) where the 
explanatory variables include the centered covariates, . The program uses the following variance 
estimator for this specification: 
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For this estimator, the degrees of freedom for hypothesis testing is 

nonresponse weights are included in the model for this specification, RCT-YES uses weighted least 
squares and a variance estimator analogous to (6.14). 


Subgroup analysis 

To incorporate covariates into the subgroup analysis for the blocked design, RCT-YES includes in 
the regression model three-way interaction terms between the block, subgroup, and treatment status 
indicators. Specifically, RCT-YES estimates the following regression model where the centered 
covariates, , are added as additional model regressors: 

(6.18) y,=tt + 1 1 m , 

6=1 g=l 6=1 g=l 


where 7^^* = {T-^-Pg^) and r]. is the error term. In this formulation, = ^„cIus,sAmr,fp is the ATE 
parameter for subgroup g in block b . The variance estimator for is 
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The regression-adjusted block-specific impact estimates and variances can then be weighted to yield 
overall ATE estimates for each subgroup. RCT-YES conducts hypothesis testing for the pooled 

subgroup estimator using t-tests with +^cg 6 )“^ 4 'g ~2/z) degrees of freedom, where 
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= (n^ / n) = i® ^he proportion of all students in subgroup g . Nonresponse weights 

can be incorporated into the analysis by estimating a weighted regression model and using similar 
variance estimators as in (5.51). 

Finally, if the BLOCK_FE=l option is specified, RCT-YES estimates (6.12) using the centered 
covariates and calculates the following variance estimator: 


h 'V 
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where ^ ) • Foi” this estimator, the degrees of freedom for hypothesis testing is 

^"^bS^Tgb ~^^cgb)~^^g If non-response weights are used, RCT-YES uses a subgroup version 

of (6.14) for variance estimation. 

c. SP model without baseline covariates 

For Design 2, the SP model yields different ATE parameters depending on researcher assumptions 
about the multilevel sampling of study blocks and/ or students within study blocks from broader 
populations. By default, RCT-YES focuses on the PATE parameter that assumes random sampling 
at all levels. This design could be germane, for example, in multisite evaluations that include many 
school districts dispersed across a broad area targeted for the intervention. RCT-YES can also estimate 
the CATE parameter (random sampling of students, but not blocks) and the DATE parameter 
(random sampling of blocks, but not students) if the CATE_UATE option is specified as a program 
input (see Table 2 in Chapter 2). 

Importantly, for SP designs with multiple stages of actual or assumed random sampling of blocks and 
sub-blocks, RCT-YES users should specify the BEOCK_ID for the highest sampling level, because 
adjusting for variances at higher sampling levels incorporates variances for lower sampling levels. Eor 
example, consider a multistage blocked SP design where (1) districts are randomly sampled for the 
study (random blocks), (2) schools are randomly sampled within the study districts (random sub- 
blocks), and (3) students are randomly assigned to the treatment or control groups within the study 
schools. To estimate the PATE parameter for this design, RCT-YES users should treat districts— the 
highest sampling level— as the random block and specify the BEOCK_ID as the district identifier, not 
the identifier for the lower level school sub-blocks. 

Eor the statistical analysis, we assume infinite super-populations so that finite sample corrections do 
not apply (this approach yields conservative variance estimators). In practice, however, users may 
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want to assume random sampling from finite sample universes. Thus, RCT-YES allows weights to 
differ across blocks. 

Consider first the CATE parameter pooled across blocks, which can be expressed as 

P nclus, blocked , CATE w^). We already considered estimators for 

this parameter in Section 6b where we discussed the similar FP parameter for Design 2. The key 
differences between the CATE and FP estimators are that the block-specific weights could differ 
because the CATE weights might reflect block population sizes rather than block sample sizes, and 
the FP heterogeneity term does not enter the variance estimator for the CATE parameter. The 
BEOCK_FE=l option can be specified for the CATE parameter. In this case, RCT-YES uses the 
variance estimator in (6.9). 

The situations are more complex for the PATE and DATE parameters which yield random block 
designs. In what follows, we first discuss the PATE parameter in detail and then briefly discuss the 
DATE parameter as a special case of the PATE parameter. 

The PATE parameter for Design 2 is 

( 6 . 2 1 ) Pndus, blocked, PATE ~ ^ IB (^ib “ ^ib (^)) ’ 


which is the expected value of the treatment effect in the super-population of students (/ ) within 
the super-population of blocks {B). To examine this parameter further, let and 

Ecb ~ i^ib (^)) be mean potential outcomes in / for block b and let (7^ - VaTj (Y-i^ (1)) and 
CTpj -Varj(Yi^(0)) be corresponding SP variances. We can then express the (asymptotic) PATE 
parameter as 


(6.21a) B 
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where the denominator is the average block size in the super-population (0 < £'^(w^) < oo). In RCT- 
YES, the default value for is the block sample size ( ), but a broader measure of the block 
population size might be more appropriate for the SP model. 

As discussed next, the PATE parameter for the blocked design can be estimated consistently using a 
simple differences-in-means approach, but the variance estimator is somewhat different from those 
considered thus far because it represents the extent to which the estimated block-specific ATEs vary 
across blocks due to the assumed random sampling of blocks. 


78 



6. Design 2: Non-clustered, blocked 


Full sample analysis for the PATE parameter 

Consider the simple differences-in-means estimator for the PATE parameter: 
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To show that this estimator is consistent, we can use the law of iterated expectations in several stages 
by: (1) averaging over the randomization distribution ( R ) conditional on the sample of students and 
sites; (2) averaging the resulting estimator over all possible samples of students from / conditional 
on the sample of sites; and (3) averaging over all possible samples of sites from B . Using this 
approach, we find that 
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which shows that blocked pate ^ consistent estimator for the PATE parameter as the number of 
blocks, h , approaches infinity. 

The following new lemma presents the asymptotic properties of Mocked pate • proof is in 
Appendix A and adapts the approach in Imai, King, and Nall (2009) to the current context. 


Lemma 6.2. Let P^cius Mocked pate weighted simple differences-in-means estimator in (6.22) for the PATE 

parameter in (6.2 1 ). Then, as the number of blocks, h , approaches infinity, P nclus Mocked pate h consistent and 
asymptotically normal with asymptotic variance: 
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A consistent estimator for the variance in (6.24) is 


I ” „ 
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This variance estimator represents the extent to which the estimated ATEs vary across blocks. 
Intuitively, if the experiment were re-run multiple times, a different set of blocks would be selected 
each time along with their associated treatment effects. Thus, the relevant variance term is the extent 
to which impacts vary across blocks. This is different than the FP model where the concern is with 
the variances of student outcomes within blocks. 


RCT-YES conducts hypothesis testing for this specification using t-tests with (h-l) degrees of 
freedom. The degrees of freedom are based on the number of blocks because blocks are the assumed 
primary sampling unit. 

For the baseline equivalency analysis for the PATE parameter, RCT-YES uses (6.25) that is applied 
to each baseline covariate. Similarly, to test the hypothesis that covariate means are jointly similar, 
RCT-YES uses Hotelling’s T-squared statistic in (5.40b) where the covariances between the baseline 
covariates k and k' are estimated as follows: 
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Subgroup analysis for the PATE parameter 

Similar methods for estimating the PATE parameter for the full sample can be used for the subgroup 
analysis: the simple differences-in-means estimator in (6.22) and the variance estimator in (6.25) can 
be applied separately for each subgroup. RCT-YES users can specify different block-level weights ( 
'Wgh) for different subgroups. RCT-YES performs the chi-squared test in (5.35) to test the null 
hypothesis of no differences in treatment effects across subgroups.'’ 


The UATE parameter 

The UATE parameter is a special case of the PATE parameter where students are no longer assumed 
to be representative of a broader block population, but only of themselves. This parameter can be 
expressed as follows: 


A 


nclus ,blocked ,UA TE 
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and can be interpreted as the expected ATE of students who would be observed if their block was 
sampled for the study from the super-population of blocks. 


^ The estimated subgroup treatment effects within the same block could be correlated. RCT-YES, however, ignores these potential 
correlations under the assumption that block effects are additive, in which case the common block effects cancel in the tests. 
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Similar methods to those used in (6.23) for the PATE parameter can be used to show that the simple 
differenceS'in-means estimator is consistent for the UATE parameter (where averaging is conducted 
sequentially over the randomization distribution and the sampling of sites, but not the sampling of 
students within sites). Eurthermore, using methods similar to the proof of Eemma 6.2, the 

asymptotic variance of Pndus, blocked, u ate can be shown to be 
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which can be consistently estimated using (6.25). Thus, estimation methods for the UATE and 
PATE parameter are similar; the only potential difference is the choice of weights. 


d. SP model with baseline covariates 

RCT-YES can incorporate baseline covariates into the SP models to obtain regression-adjusted impact 
estimates. Eor the CATE parameter, RCT-YES estimates regression-adjusted impacts using the same 
approach as for the EP parameter where (6.6) with centered covariates is estimated using weighted 
least squares. This approach yields consistent ATE estimators for the CATE parameter, which can 
be shown using Lemma 6.1 and arguments similar to those in (6.23). 

RCT-YES uses a two-stage estimation procedure to incorporate baseline covariates for the PATE and 
UATE parameters. Eirst, RCT-YES estimates (6.6) to obtain regression-adjusted, block-specific impact 
estimates, j , if there are a sufficient number of observations.^ Second, RCT-YES estimates 
the following model, where ^ is regressed on the centered block-specific covariates, 
Xb = (X|^ - x^) , using the weights : 

(6.27) Pnclus,b,MR,PATE “ A) ’ 

where X^is a vector of weighted covariate means. This specification models the block-specific 
impacts as a linear function of block-specific covariate values. Due to the centering of the covariates, 
we have that /?q = blocked MR pate ’ which is the regression-adjusted impact estimator for the 
PATE parameter. Note that the inclusion of X,, does not change the impact estimate. Using similar 
methods to those in Lemma 6.2, it can be shown that A is asymptotically normal with an 


^ The condition is that the total sample size of treatments and controls is greater than 4K , which is the minimum sample size for the 
default FP model for Design 2. 
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asymptotic variance that can be consistently estimated using predicted values from the fitted model 
in (6.27): 

A 1 ^ /V ~ 

(6.28) AsyVarj^jM) = - 

—1 ^^bPnclus.b.MR.PATE w(A+Xby))^ 
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where v is the number of covariates in (6.27). Imbens and Rubin (2015) discuss a similar estimator 
for matched pair designs without weights, but not for the current context. 

An alternative specification is omit the first stage regression and to include in (6.27) the vector of 
differences between the block-specific covariate values for the treatment and control groups, 
dfa = (Xji, - ) . RCT-YES, however, does not adopt this approach to minimize losses in the 

number of degrees of freedom for hypothesis testing. 

For the subgroup analysis, RCT-YES estimates the following variant of (6.27) using block-level data 
stacked for each subgroup: 

(6.29) Pnclus,g,b,PATE ~ X Pg^gb ^gb Y ^gb ’ 

g=l 

where =(Xgb - v )- In this model, Pg = Pncius,g, blocked, mr,pate is the impact estimator for the 
PATE parameter for subgroup g , with an asymptotic variance that can be consistently estimated 
using the following expression: 

(6.30) AsyVargjPPP) = — — -2 'YJ^gb^gbPndus,g,b,PATE~^g^Pg +Xgbf))^ . 

where / h and = nj n . 

Importantly, if the number of blocks is relatively small, only a very small number of covariates can 
be included in the regression model in (6.28); otherwise model over-fitting problems could arise and 
degrees of freedom losses could lead to regression-adjusted estimators that are less precise than the 
unadjusted estimators. As discussed, by default, RCT-YES will only estimate regression models with 
covariates if the ratio of the number of observations— blocks in this case— to the number of covariates 
is at least 5. 
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e. Matched pair designs 

Matched pair designs are blocked designs with one treatment and one control unit per block. Under 
these designs, similar study units are paired using observable baseline measures and one unit in a 
pair is then randomly assigned to the treatment group and the other to the control group. Matched 
pair designs are common in education research— especially for clustered designs— when sample sizes 
are small. These designs help avoid the possibility of a “bad draw” where, for example, higher ability 
students (as measured by their prior year test scores) are disproportionately assigned to one research 
condition. Matching can be performed using common algorithms that compute “distance” matrixes 
between study units based on the closeness of their matching variables, which are then used to 
identify the best pairings across the sample to minimize a global distance metric. 

RCT-YES does not perform pairwise matching, but can accommodate such designs. Matched pair 
designs can be specified in RCT-YES by setting the MATCHED_PAIR variable equal to 1 and 
specifying code names for the matched pairs using the BLOCK_ID variable (see Table 2). 

Importantly, for these designs, RCT-YES includes pairs (blocks) in the analysis only if both members 
of the pair have available data. 

The simple differences-in-means estimator in (6.3) or (6.23) produces unbiased estimates for 
matched pair designs. Depending on the parameter of interest, the pairs can be weighted equally 
(w^ = 1) because each pair has two students or, if germane, based on some broader pair 
population size. 

For matched pair designs, without further assumptions, variance estimators for the FP model are not 
identifiable because each pair contains only 1 treatment and 1 control group member (Imai, King, 
and Nall, 2009; Imbens, 2011). To address this issue, RCT-YES estimates ATFs for matched pair 
designs using the SP variance estimator in (6.25) for the PATE parameter. Imai, King, and Nall 
(2009) recommend this approach for clustered, matched pair designs and show that the variance 
estimator in (6.20) is conservative (an upper bound) for the FP parameter. Baseline covariates can 
be incorporated into the analysis using the same approach as for the PATE parameter discussed 
above. 

RCT-YES uses the same methods for the matched pair design for subgroup analyses. A potential 
problem with these analyses is that each member of a pair may not always have the same values for 
the subgroup variables— that is, the pairing can be “broken” in some cases. RCT-YES excludes such 
broken pairs from the analysis. If this problem is common, users may want to exclude these 
subgroups from the analysis. 
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f. The CAGE parameter 

For Design 2, RCT-YES estimates the CAGE parameter (if requested) using the same general 
methods as for Design 1. The program obtains CAGE estimates by dividing the estimated ATEs for 

the outcomes ( ) by the estimated ATEs for the service receipt variables ( ). For the default FP 
model, the GAGE estimates are obtained separately by block and then weighted to obtain pooled 
GAGE estimates. To calculate standard errors of the GAGE estimates, the program uses (5.57) where 

the variance terms are calculated using the pertinent Design 2 variance formulas for P and . 
The covariance term in (5.57) for the default FP model with BLOGK_FE=0 is calculated using 
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b,FP^ Pcl) 

(6.31) C0V„ iPnclu.Mocked.FP ^ Pcl) = J, 

h=l 


where Cov ^p, P(.P) is calculated using (5.58) applied to each block, and similarly for the 

GATE model. If the BLOGK_FE=l option is specified for the EP or GATE models, RCTYES ignores 
the covariance terms. Eor the PATE and DATE models, RCT-YES calculates the covariance 
terms using 
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where Py, is the estimated ATE on the service receipt rate in block b . 


To estimate the CAGE parameter for the default EP model, RCT-YES excludes blocks where there is 
no variation in either the outcome variable or the service receipt variable for either research group. 
Thus, many blocks could be excluded, for example, if nearly all treatment group members received 
intervention services and very few control group members were crossovers. In this case, users may 
want to consider the BEOCK_EE = 1 option for the EP model. 
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7. Design 3: The clustered, non-blocked design 

This chapter discusses ATE estimators under the Neyman-Rubin-Holland model where clusters or 
groups (such as schools or classrooms) rather than students are randomly assigned to a treatment or 
control condition within a single population. Under these designs, all students within a cluster (for 
example a school) are assigned to the treatment or control status of their cluster. These group-based 
designs are common in education research, because education RCTs often test interventions that 
are targeted to the group (for example, a school re-structuring initiative or professional development 
services for all teachers in a school). Thus, for these types of interventions, it is infeasible to randomly 
assign the treatment directly to students. Furthermore, clustered designs can help minimize the 
potential spillover of intervention effects from treatment to control students, thereby increasing the 
plausibility of the SUTVA condition that underlies the Neyman-Rub in-Holland framework (see 
Chapter 4). 

For clustered designs, RCT-YES estimates ATEs using design-based methods where individual-level 
data are averaged to the cluster level. Accordingly, RCT-YES can accommodate data in two formats. 
First, the program can use individual-level data that RCT-YES averages to the cluster level 
(CFUSTER_DATA=1). Second, the program can use data that have already been averaged to the cluster 
level, for example, average student test scores in a study school (CFUSTER_DATA=0). For this 
option, a separate set of cluster-level means is required for the full sample analysis as well as for each 
subgroup analysis. Furthermore, for this option, users are required to include the CFUSTER_FUFF 
variable in the input data file that indicates whether the cluster average pertains to the full sample 
or a subgroup (see Table 2). Furthermore, for this option, users must specify the STD_OUTCOME 
input variable that specifies the student-level standard deviation for each outcome for the control 
group if users want the estimated impacts to be scaled into effect size units in the output tables. 

RCT-YES requires that non-missing cluster identifiers be specified for each observation in the input 
data file; otherwise the analysis is not performed. The cluster identifiers could be masked to hide 
the true identities of the clusters. RCT-YES includes clusters in the analysis that have at least one 
student with available outcome data. This holds for both the full sample and subgroup analyses. The 
program produces summary statistics on cluster sizes and weights (if specified). 

The approach of averaging the data to the cluster level makes it clear that clustered designs have less 
statistical power than non-clustered ones. Although not readily apparent, variance estimators under 
HEM models are also largely based on the variation in mean outcomes across clusters. The key 
difference between the HEM and design-based approaches is the weighting scheme used to pool 
clusters for impact estimation; HEM methods use precision weighting (which requires the estimation 
of model variance components), whereas design-based methods use weights based on the ATE 
parameter of interest and associated sampling theory (see Chapter 4). 
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In the remainder of this chapter, we discuss ATE estimators and their standard errors for the FP and 
SP models for Design 3. As before, we consider models with and without baseline covariates. There 
is a much smaller literature on design-based models for clustered designs than non-clustered designs. 
However, with some slight modifications, the methods for non-clustered designs largely apply to 
clustered designs where data are averaged to the cluster level. 

For simplicity, we hereafter refer to clusters as “schools,” although clusters could also be classrooms, 
school districts, or other groups of students. For the analysis, we use similar notation as in Chapter 
5 with the addition of the subscript “j” to indicate schools. For instance, f^y(l) and Yy{0) are 

potential outcomes for student i in school j , yjj is the observed outcome, and is the treatment 
status indicator variable that equals 1 if school j is randomly assigned to the treatment condition 
and 0 for control schools. We assume that the sample contains m schools with nij = mp treatment 
schools and = w(l — />) control schools, where p is the sampling rate to the treatment group 
(0</?< 1) . It is assumed that school j has n- students. 


a. FP model without baseline covariates 

In this section, we discuss simple differences-in-means estimators for the FP model, first for the full 
sample analysis and then for the subgroup analysis. We also discuss the calculation of intraclass 
correlations (ICCs), the use of nonresponse weights, and baseline equivalence tests. For the FP 
model— which is the default in RCT-YES— student and school potential outcomes are assumed to be 
fixed for the study. 


Full sample analysis 

The ATE parameter for the FP model for the clustered design is 
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where W- is the fixed school weight and F G) = ) T,- (0) = (y~',F F/ (Q) / ^ ) 

are mean potential outcomes in the treatment and control conditions, respectively. The ATE 
parameter h a weighted average of the ATE parameters in each school. 

A central research question is whether interest lies in intervention effects for (1) the average student 
in the sample {w^j and w ^ — n ^ ) or (2) a student in the average school in the sample (w^. = (1/ n.) 
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and Wj = 1 ). This distinction will only matter if student sample sizes vary across schools and ATEs 
vary by school size. The default weight in RCT-YES is = 1 , so that each school is weighted equally 
in the analysis; this weighting scheme aligns with the random assignment mechanism. In this case, 
the ATE parameter is {Yj (1) - (0)) / m ) . 

If interest lies instead in ATEs for the average student, RCT-YES users can include a weight variable 
in the input data file where W-j - 1 for each observation. In this case, pp is conceptually similar 

to the ATE parameter for the non-clustered design. To demonstrate this more formally, ifw^ = n - , 
we can express the ATE parameter as a function of student-level potential outcomes as follows: 

_ _ 1 m 

(7.2) = 7(1) - 7(0) = i £ X (^- (1) - (0)) . 

n j=i 


where n = rij is the total student sample size. It is clear that (7.2) is the average ATE for students 
in the study sample. 

Eor the clustered design, the data generating process for the observed mean outcome for a school, 
= (Z"l iTy /^v') > can be expressed as follows: 


(7.3) j,=r//i)+(i-r,)7/o). 


This simple relation underlies all the estimators and standard errors for pp that are developed 
in this section and that are used in RCT-YES. 

In what follows, our statistical analysis considers the general case where weights differ across clusters. 
The default RCT-YES specification where cluster weights are equal is a special, simplified case of the 
more general analysis. 

Consider the simple differences-in-means estimator for P^i^^ pp : 
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This estimator is biased in finite samples if the weights differ across schools (and school-level ATEs 
are heterogeneous), because the denominators in (7.4) will depend on the particular allocation of 
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schools to the treatment and control groups. To see this, consider an example with — n . , where 
the sample contains 4 schools with respective student sample sizes {n^,ri 2 ,n^,n^) . Accordingly, the 
true ATE parameter for the sample is 


2«,(r,(i)-r,(0)) 

(7.5) 
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Suppose next that 2 schools are randomly assigned to the treatment group and 2 are randomly 
assigned to the control group. Note that there are 6 possible allocations of the four schools to the 
treatment and control groups. As an illustration, if the first two schools were selected for the 
treatment group, the estimated treatment effect would be 





+ «2 


«3 +«4 


and similarly for the other 5 possible treatment-control allocations. Averaging over all six equally 
likely outcomes yields 
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For (7.6) to be unbiased for the ATE parameter in (7.1), either of three conditions must hold: (1) 
the ATEs are homogenous across schools (that is, (Yj (1) - Yj (0)) = k for some constant k ); (2) school 
sample sizes are equal; or (3) schools are weighted equally (the default in RCT-YES). Otherwise 
A&s sp biased in finite samples. 

If the weights differ across schools, however, the simple differences-in-means estimator is 
asymptotically unbiased as the number of schools, m , approaches infinity. The parameter in (7.1) is 
still the FP parameter of interest, but it is convenient to conduct the analysis using an asymptotic 
version of this parameter. To define this parameter, we assume that for large tn : 

m 
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where ATE j = (h^ (l) ~ ^ (0)) is the ATE parameter for school j , and Epp signifies expectations 

over the increasing sequence of finite populations, which are assumed to be fixed, nonnegative real 
numbers. Using (7.4), we find then that as m approaches infinity: 
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The parameter pp^ is the large-sample FP parameter for our analysis of the clustered design. 
Note that (7.8) holds because T. is independent of both the school-level potential outcomes 
(7^ (1), 7^ (0)) and school-level weights (w^) due to random assignment. 


One approach suggested by Imbens (2011) for minimizing the bias of when weights differ 

across schools is to group treatment and control schools into blocks with similar weights (for 
example, with similar sample sizes). An unweighted analysis can then be conducted in each block 
and the full sample estimate can be calculated as a weighted average of the block-specific impact 
estimates. Variance estimates can be obtained using a similar approach. 

RCT-YES does not employ this post-stratification approach, however, because of the difficulty in 
automating this approach without some user input on how to create the school strata (such as the 
number of strata to select and the associated cutoff values for defining the strata). Instead, RCT-YES 
adopts a large-sample approach to calculate the variance of pp . 


To examine the asymptotic properties of pp (and corresponding estimators that include baseline 

covariates), we use the relation in (7.3) to develop a regression model similar to the one used for the 
non-clustered design: 

(7.9) Yj =pQ+ Pcius,Fp(^j ~P)E Jlj , where 


Pclus,FP ~ (1) - 7^ (0) = X w, (7^. (1) - 7^. (0)) / X w, , 

i=l 7=1 

p,=p%.(l) + (l-p)g.(0), 

= p{Y^ (1) - % (1)) + (1 - p){Y^ (0) - (0)), 

=(7/l)-f^(l))-(7/0)-f^(0)). 
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Note that (7.9) can be derived by averaging the following regression model at the student level to the 
school level: 

(7.9a) y^j = /?« + (Tj -p) + (Uj + ) , where 


Uj=Tj(Yj(\)-YAl)) + (l-Tj)(Yj(0)-YAO)) and 


= Tj (Y, (1) - Y. (1)) + (1 - Tj ){Y,j (0) - 7, (0)) 


are school- and student-level error terms, respectively. Note that the student-level error terms, €■■ , 
disappear when averaging this model to the school level. 

The following lemma provides the large sample properties of the weighted least squares estimator 
foi" /^ncius FP (79) using the weights W- ; the proof is in Appendix A (see also, Schochet 2013 for 
a proof of a special case of this lemma). 

Lemma 7.1. As the number of schools, m , increases to infinity for an increasing sequence of finite populations, 
assume that 
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where the asymptotes are fixed, nonnegative, real numbers. Then, the weighted least squares estimator, pp, 
is asymptotically normal with asymptotic mean pp^ = Epp(WjATEj) / Epp(Wj) and asymptotic 

variance: 
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The variance formula in (7.11) is very similar to the corresponding variance formula for the non- 
clustered design in (5.8) for Design 1. The first two terms in the brackets, and , pertain to 

the extent to which school-level potential outcomes vary across schools. The term pertains to the 
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extent to which school-level treatment effects vary across schools, which is not identifiable because we 
can only observe the outcomes of schools in either the treatment or control condition, but not both. 


RCT-YES uses the following upper-bound variance estimator for (7.11): 
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and p — {lflj-l m). Note that if Wj -n., we can express and in terms of student-level 
outcomes as follows: 
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which pertain to the extent to which student-level potential outcomes vary and co-vary across 
students within the same schools. Similar robust variance estimators can be obtained using the 
generalized estimating equation (GEE) approach developed by Liang and Zeger (1986) for clustered 
data assuming an independent working correlation structure, an identity link function, and the 
empirical sandwich variance estimator. 

RCT-YES conducts hypothesis testing for the clustered EP design using t-tests with (m^ +m^-2) 

degrees of freedom. The number of degrees of freedom is based on the number of schools because 
the analysis is conducted at the school level. 


Calculating Intraclass correlation coefficients (iCCs) 

Design effects for a clustered design are typically defined as the inflation in the variance estimates 
due to clustering relative to a simple random sample design of the same size. Cochran (1977), 
Conner and Klar (2000), Kish (1965), and Murray (1998) discuss the calculation of design effects in 
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terms of the intraclass correlation coefficient (ICC), which is the proportion of variance in the 
outcome that lies between clusters. This relationship is often approximated as follows: 

(7.13) Deff-aus = l + p{n-l), 


where p is the ICC and n h the average cluster (school) size. The ICC is an 

important parameter to help interpret variance estimates for clustered designs and to calculate 
statistical power to assess appropriate sample sizes when designing clustered RCTs. 

If data are provided at the individual level, RCT-YES calculates p in two steps: (1) estimating 
Deff-Clus by dividing the variance estimator for the clustered design in (7.12) by the variance 
estimator for the non-clustered design in (5.10) and (2) solving for p in (7.13). This yields the 
following estimator for p : 


( 7 . 14 ) 


1 AsWar^Kius ,Fp) 

(«-l) 


1 )- 


RCT-YES reports jp and Dejf-Clus in the program output for full sample analyses. To ensure 

A 

consistency of ICC calculations across studies, RCT-YES uses (7.14) to calculate p for all considered 
clustered designs for Designs 3 and 4, including FP and SP designs and models with and without 
baseline covariates. The ICC is not reported for matched pair designs. 


Subgroup analysis 

RCT-YES requires that subgroups be defined as categorical variables. If the input data are provided at 
the individual level (CLUSTER_DATA=1), RCT-YES conducts the analysis by creating school-level 
averages for each subgroup. For instance, to examine gender subgroups, the program would create 
two school-level averages for a co-ed school— one for girls and one for boys— but only one school-level 
average for a single-sex school. The program also creates subgroup indicator variables, Gj^ , for each 
school-level average. For instance, for the co-ed school from above, these indicators would take the 
values Gji - 1 and G -2 — 0 for the girl observations and Gj^ - 0 and Gj 2 - 1 for the boy 
observations. 

If the input data are provided at the cluster level (CLUSTER_DATA=0), the data file must contain 
school-level averages for the full sample as well as for each subgroup. Stated differently, the data file 
must parallel the constructed subgroup data file described above when CEUSTER_DATA=1. The 
data file must also contain an indicator variable specified in CEUSTER_EUEE that signifies whether 
the school-level average is to be used for the full sample analysis or a subgroup analysis. 
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Using this data structure, the same methods discussed above for the full sample can be used to 
estimate ATEs for subgroups, because random assignment ensures that T. Jf (hi (1), 1^- (0)) 

conditional on any covariate value defined by pre-randomization characteristics. The simple 
differences-in-means estimator for the ATE parameter for subgroup g is 
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where Gjg is a subgroup indicator variable defined above; = ( ^ / rijg ) is the mean 

i:Gy^=l 

outcome for subgroup g in school j ; n is the number of students in the subgroup; nij^ and 
are the respective number of treatment and control schools that contain students in the 
subgroup; and is the school-level weight for the subgroup analysis. 


Using Eemma 7.1 applied to subgroup g and similar arguments as for the subgroup analysis in 
Chapter 5 for Design 1, we find that PciusgFP consistent and asymptotically normal with an 
asymptotic variance that can be estimated as follows: 
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and Pg = (PlTg ! ^g) 1® observed proportion of all schools in the treatment group that contain 
students in subgroup g . RCT-YES conducts hypothesis testing for the subgroup analysis using t-tests 
with (nij-g 3- - 2) degrees of freedom. Because the estimated impacts are independent across 

subgroups in the EP model, RCT-YES conducts the chi-squared test in (5.35) to test the null 
hypothesis of no differences in estimated treatment effects across subgroup levels. 
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Including nonresponse weights 

If the input data file contains individual-level data and weights that adjust for data nonresponse (or 
for other reasons), RCT-YES uses the weights ( W-j ) to estimate school-level means using the formula 

y jw ~ program also incorporates nonresponse adjustments at the school 

level {w- = ) to aggregate school means to obtain overall ATE estimates. 


Assessing baseline equivalence 

To assess baseline equivalence for the clustered FP design, RCT-YES conducts t-tests for each baseline 
covariate specified in BASE_EQUIV assuming equal variances for the treatment and control groups. 
For baseline covariate k , RCT-YES calculates the following t-statistic: 


(7.17) 
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( 1 ) , where 

m-p rrip. 


^kw ~ i^nw ~^ckw) ’ ^Tkw tittd are covariate means across all schools, 

" m^ + m^-2 ' " K-1),4, ' 

V^C r-Tj=0 

RCT-YES also uses Hotelling’s T-squared statistic to test the hypothesis that covariate means are 
jointly similar: 
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{Me -k) j-j.=0 


This statistic is distributed as F{v, ffij, + — 1 — v) where V is the number of covariates. By default, 

RCT-YES uses the rule that there must be at least 5 clusters per covariate for clustered designs or the 
joint test is not performed, although the cutoff rule of 5 can be changed using the OBS_COV option 
(see Table 2). 


b. FP model with baseline covariates 

For clustered designs, RCT-YES conducts multiple regression analyses using covariates averaged to 
the cluster level. These covariates can be student-related measures (for example, average student test 
scores in the school); teacher-related measures (for example, the percentage of teachers in the school 
with an advanced degree); or school-related measures (for example, school size or an indicator of 
whether the school is in a rural or urban setting). Importantly, program users should only input a 
small number of highly predictive covariates for the analysis. RCT-YES requires that the sample 
contains at least 5 clusters per covariate or the regression analysis is not performed. Thus, with 20 
schools, the model can include at most 4 covariates in addition to the intercept and treatment status 
indicator variable. The cluster-to-covariate ratio, however, can be changed using the OBS_COV 
input statement. For clustered designs, a missing covariate is imputed using the cluster-level mean if 
it is available; otherwise, the same imputation rules are applied as for Designs 1 and 2 using cluster- 
level data (see Chapter 5, Section i). 

Next, we discuss the multiple regression estimator for the clustered design under the FP model, first 
for the full sample analysis and then for the subgroup analysis. 

Full sample analysis 

To examine the statistical properties of the weighted multiple regression estimator using the weights 
Wj, we use the regression model in (7.9) where the explanatory variables include a Ixv vector of 
fixed baseline cluster-level covariates, , with associated parameter vector y . As with the non- 
clustered design, the covariates are irrelevant variables and the ATE parameter for the FP model 
without covariates ( pp^ in (7.8)) pertains also to the model with covariates. Note that we do not 

need to assume that the true conditional distribution of y j given is linear in Xj . 

Let Zj = (1 Tj Xj) be a vector of model explanatory variables. The weighted multiple regression 
estimator for the ATE parameter using the school-level weights Wy is 
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(7.18) A 


cIus,MR,FP,W 


7=1 !=1 


(2,2) • 


To examine the asymptotic moments of pp , it simplifies the proofs to instead use the centered 

covariates Tj and Xj , where Tj=(Tj—p) and for school j and covariate k; 

apart from the intercept, this centering has no effect on the parameter estimates. The following 
lemma uses similar results in Schochet (2013). The proof is in Appendix A. 

Lemma 7.2. As tn approaches infinity, assume (7-10) and that 
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where (X ^ and T ^ are defined in (7-9); and are mxm symmetric, finite, positive definite matrices; 
and ^xwa’^xwT’ ^xwa> ^xwT finite vxl vectors of fixed real numbers. Then, ^ is 

asymptotically normal with asymptotic mean pp^ and asymptotic variance: 
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where y = 


The first part of the right-hand side in (7.20) is the variance estimator under the FP model without 
covariates, so the remaining terms represent precision gains (or losses in rare cases) from adding 
covariates. 


A direct approach for estimating the components of (7.20) is as follows: 
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where the covariance matrixes are estimated using sample moments: 
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In this expression, , 'P-j-yv > 'P^^are vxl vectors of weighted sample covariances 

between and for the treatment and control groups, respectively: 
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This estimation approach becomes more cumbersome for the subgroup analysis and the blocked, 
clustered design (Design 4) discussed in the next chapter. Thus, similar to Designs 1 and 2, RCT- 
YES instead estimates (7.20) using weighted regression residuals: 
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are regression mean square errors for the treatment and control groups, respectively; Eo > 
Ecius MRFPw ’ 7 are parameter estimates from a weighted regression of y ^ on Zj = [1 T- Xj]; and 

other terms are defined above. 

RCT-YES conducts hypothesis tests for the multiple regression estimator using ntests with 
{nij. + ni(, — V — 2) degrees of freedom, where V is the number of baseline covariates. 
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Subgroup analysis 

To obtain regression-adjusted estimators for the subgroup analysis, RCT-YES stacks school-level 
averages for each subgroup and estimates the following regression model that is based on the relation 

y. = y* G y. : 

j Z—ig=i Jg-^jg 

(7.23) n = 

g=l g=l 


where 7b ,■ =T, — p„, ?7 , = y ^ G + r f . ) is the error term, and is observed subgroup 

g) ] ^g 'J Jg^ J J a' ^g or 

sampling rate to the treatment group. In this model, gURFPw the ATE parameter for 

subgroup g . 


If baseline covariates are added to (7.23), it can be shown using the methods from Lemmas 5.5 and 
7.1 that the weighted multiple regression estimator for is consistent and asymptotically normal. 

RCT-YES uses the following new asymptotic variance estimator for 0^ : 
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— {nij^ > tind — {m^ / m) is the proportion of all schools containing subgroup g . 


RCT-YES conducts hypothesis testing for subgroup analyses using t-tests with (m^.^ -t - vq^ - 2) 

degrees of freedom. To test the null hypothesis of no differences in estimated treatment effects across 
subgroup levels, RCT-YES applies the chi-squared test in (5.35) using the regression-adjusted impact 
and variance estimators. 
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c. SP model without baseline covariates 

The SP model under the clustered design assumes that schooUevel and/ or student-level potential 
outcomes are random draws from super-population distributions. As with the blocked design, RCT- 
YES can estimate several SP model parameters for Design 3. First, by default, RCT-YES estimates the 
PATE parameter that assumes random sampling of both schools and students within schools. HLM 
methods that are often used in education research to analyze experimental data focus on the PATE 
parameter. Second, RCT-YES can estimate the CATE parameter (random sampling of students, but 
not schools). Finally, the program can estimate the UATE parameter (random sampling of schools, 
but not students). The CATE and UATE parameters can be estimated using the CATE_UATE 
program input option. 

An important issue for SP clustered designs is how RCT-YES users should treat clustering (nesting) 
below the unit of random assignment. To explain this issue, it is helpful to consider an example of 
a clustered design where schools are randomized to a treatment or control group and where 
classrooms within the study schools are assumed to be representative of a broader set of classrooms 
(or are actually randomly sampled, for example, to reduce data collection costs). In this design, 
students are nested within classroom clusters, which are in turn nested within school clusters. In 
this case, RCT-YES users should account for clustering effects due to school-level randomization (to 
account for the experimental design), which will also capture clustering effects due to the lower-level 
classroom sampling. Stated differently, for this design, school identifiers should be specified in the 
CEUSTER_ID input variable, not classroom identifiers. 

Eor the statistical treatment of SP clustered designs, we assume infinite super-populations so that 
finite sample corrections do not apply (this approach yields conservative variance estimators). In 
practice, however, users may want to assume random sampling of clusters and students from finite 
sample universes. Thus, RCT-YES allows weights to differ across clusters. 

Eor clustered designs, the UATE parameter is the simplest to analyze because it is similar to the EP 
parameter for Design 3 except that schools are assumed to be a random sample from the super- 
population of schools (S) rather than being fixed for the study. The UATE parameter can be 

expressed as E^s(/?c/ib,fp) > which averages the EP parameter over all possible samples of m schools 

from S . Thus, similar estimation methods can be used for the UATE and EP parameters, the only 
differences being the exclusion of the EP heterogeneity term from the variance estimators and the 
choice of cluster weights. 

The situation is more complex for the PATE and CATE parameters. In what follows, we first discuss 
the PATE parameter in detail and then discuss the CATE parameter as a special case of the PATE 
parameter. 
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The PATE parameter for the clustered design is 

(7.25) 


which is the expected value of the treatment effect in the super-population of students (/) within 
S . To examine this parameter further, let and ju^j = Ej{Yy(Q)) be mean potential 

outcomes in / for school 7 , and let cr^ =Varj{Yy{\)) and cr^. = Far^ (0)) be corresponding 
super-population variances. We can then express the (asymptotic) PATE parameter as 


(7.26) PcIus,pate 


Esi^j[^Tj-^Cj\) 


In RCT-YES, the default is = 1 , but other weighting schemes might be more appropriate for the 
PATE parameter. 

As discussed next, the PATE parameter for the clustered design can be estimated consistently using 
a simple differences-in-means approach, with an asymptotic variance estimator that is similar to that 
for the UATE parameter despite the assumed multilevel sampling of schools and students. 


Full sample analysis for the PATE parameter 

Consider the simple differences-in-means estimator for the PATE parameter: 
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To show that this estimator is consistent, we use the law of iterated expectations, where we 
sequentially average over / , the randomization distribution (i?) , and S : 
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The following new lemma presents the asymptotic properties of . The proof is provided in 

Appendix A. 
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Lemma 7.3. Let Pdus pate weighted simple differences-in-means estimator in (7-27) for the PATE 

parameter in {7.26). Then, as Jfl approaches infinity, P^ius pate h asymptotically normal with asymptotic 
mean P^us pate asymptotic variance: 
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A consistent estimator for the variance in (7.29) is 


(7.30) -2 T v 
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where Sj.^, and were defined in (7. 12). 


Note that the first-order variance approximation in (7.29) does not contain terms reflecting the 
variances of student potential outcomes within schools. As shown in the proof in Appendix A, these 
within-school variance terms vanish from the asymptotic variance expression because they are of 
order 0^(1/ m) rather than 0^(1/ as is the case for the between-school variance terms. The 

within-school variance terms, however, enter the variance formula in finite school populations (see 
Cochran, 1977, pages 300-306). 

RCT-YES conducts hypothesis testing for the PATE parameter using t-tests with {iflj. + m^. — 2) 
degrees of freedom. 


Subgroup analysis for the PATE parameter 

A similar estimation approach as for the full sample analysis can be used to estimate the PATE 
parameter for the subgroup analysis: the simple differences-in-means estimator in (7.27) and the 
variance estimator in (7.30) can be applied separately for each subgroup. 

Eor the PATE specification, the mean outcomes of student subgroups (for example, girls and boys) 
within the same school could be correlated. This occurs because students in the same school share a 
common school random effect— that is, a common school-level potential outcome sampled from 5. 
To adjust for these correlations in the joint chi-squared tests in (5.35) to assess differences in 
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subgroup impacts, RCT-YES estimates the covariances between the impact estimates for subgroups 
g and g' as follows: 
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The covariances in (7.31) are used to construct the variance-covariance matrix, , in (5.35) to 
conduct the joint chi-squared tests. 

We considered using an alternative approach where the calculations for each covariance would be 
conducted using only schools that contain students in both subgroups. A problem with this 
approach, however, is that the samples used to calculate the covariances could differ across subgroup 

pairs, because some schools might not contain all subgroups. This could lead to matrices that 
are not positive definite. Restricting the analysis sample to only those schools that contain all 
subgroups could help with this problem, but the resulting sample could be small and 
nonrepresentative if some subgroups are concentrated in certain schools. Thus, as a compromise, 
RCT-YES uses the full sample to calculate each covariance, where schools without particular subgroups 

do not contribute to the numerators of (7.31) but enter the denominators. The diagonals of 
(that contain the variances of the subgroup impacts), however, are based on (7.30) that is applied 
separately to each subgroup and are calculated using only schools that contain students in the 
considered subgroup. 


The CATE parameter 

The CATE parameter is a special case of the PATE parameter where schools are no longer assumed 
to be representative of a broader school population, but only of themselves. This parameter can be 

expressed as 

,CATE (E" J Wj {pjj - Pcj ) / ZJ=1 ^j ) > can be interpreted as the expected 

treatment effect for the super-population of students in the study schools. 

Similar methods to (7.28) can be used to show that the simple differences-in-means estimator is 
consistent for the CATE parameter (where averaging is conducted sequentially over / and R but 
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not S ). The asymptotic variance of is shown in (A.42) in Appendix A as part of the proof 

for Lemma 7.3. RCT-YES uses the variance estimator in (7.30) for the CATE parameter. 

Assessing baseline equivalence 

To assess baseline equivalence for all the SP parameters, RCT-YES conducts t-tests using (7.17) and 
the joint F-test in (7.17a). 


d. SP model with baseline covariates 

RCT-YES incorporates baseline covariates for the clustered SP model using a similar approach as for 
the clustered FP model. The program estimates a weighted regression model using data averaged to 
the cluster level, and uses model residuals to estimate variances. Using this approach, RCT-YES uses 
the following variance estimator for the PATE, CATE, and UATE parameters: 


(7.32) 
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A similar approach is used for the subgroup analysis using (7.24), excluding the FP heterogeneity 
term. 


e. The CAGE parameter 

The analysis of the complier average causal effect (CACE) parameter is more complex for clustered 
RCT designs than nonclustered ones because compliance decisions can be made by cluster-level staff 
as well as by individuals within clusters. For example, in the education area, the extent to which 
students receive intervention services could depend on compliance decisions made by both school 
staff (such as superintendents, principals, and teachers) and students. Similarly, in a health setting, 
compliance decisions can be made by hospital staff as well as patients. Under this scenario, there are 
16 possible compliance groups rather than 4 as for the non-clustered design. 

Schochet and Chiang (20 1 1) discuss the identification of the CACE parameter for multilevel models 
that generalizes the SUTVA, monotonicity, and exclusion restriction assumptions discussed for the 
non-clustered design in Chapter 1, Section 1. Under these assumptions, Schochet and Chiang show 
that the CACE parameter for clustered designs can be consistently estimated using the same general 

methods as for Design 1 by dividing the estimated ATEs for the outcomes (j0) by the estimated 
ATEs for the service receipt variables ( ). Furthermore, the Taylor series approximation in (5.57), 
applied to clustered designs, can be used to calculate standard errors of the CACE estimates. 
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Accordingly, RCT-YES uses {5.51) for standard error estimation where the variance terms are 
calculated using the pertinent Design 3 variance formulas for f5 and . To estimate the 
covariance term in (5.57), we first define d-- to be an observed indicator variable that equals 1 if an 

individual received intervention services and 0 otherwise. The covariance term in (5.57) can then be 
estimated as follows using the FP model as an example: 
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In this expression, d^f^j = h the service receipt rate in school j , and and 

dff,j are either variable means for the simple differences-immeans estimator or predicted values from 

fitted school-level regression models with baseline covariates (where RCT-YES uses the same 
covariates for the service receipt and outcome variable regressions). The same approach is used for 
all Design 3 FP and SP models and subgroup analyses. 

RCT-YES users can estimate the CAGE parameter by specifying the name of the service receipt 
variables using the GOT_TREAT input variable. Importantly, these variables must be binary if the 
data contain individual-level records (CLUSTER_DATA=1) but must be continuous service receipt 
rates with values between 0 and 1 if the data contain cluster-level averages (CLUSTER_DATA=0). 
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8. Design 4: The clustered, blocked design 

This chapter discusses ATE estimators under the Neyman-Rubin-Holland model for Design 4 where 
schools (or other clusters) are randomly assigned to the treatment or control conditions within 
blocks (for example, sites). An example of this design is the Evaluation of Mandatory-Random 
Student Drug Testing (James-Burdumy et ah 2010), where 36 schools in seven districts were 
randomly assigned to a treatment or control group. Students in the treatment schools who 
participated in specific extracurricular activities were subject to random in-school drug testing, 
whereas students in control schools were not. This evaluation was a clustered, blocked design 
because study schools were randomly assigned separately within each school district (block). 

The data requirements for Design 4 combine those for Designs 2 and 3. Of particular importance, 
for the default EP specification, RCT-YES performs the analysis using only blocks that contain at 
least 2 treatment and 2 control schools. Thus, if the sample contains many small blocks, RCT-YES 
users might consider using the SP specification or the EP specification with the BEOCK_EE=l 
option which both require only 1 treatment and 1 control school per block. 

The estimation methods for Design 4 also combine those for Designs 2 and 3. Eurthermore, the 
methods for moving from Design 3 to Design 4 for the clustered design are similar to those for 
moving from Design 1 to Design 2 for the non-clustered design. Thus, we provide much less detail 
on the ATE estimators for Design 4 than for the previous designs. 

In this chapter, we use the same notation as in previous chapters. The subscript refers to 
students, “ j” to schools, “b” to blocks, and “ g” to subgroups. Thus, for example, is the 

treatment status indicator variable for school j in block/) and S-jf^ is an indicator variable signifying 
whether a student is in block b . The student-level weight is , the school-level weight is 
^Jb " block-level weight is • 

In what follows, we first discuss estimators for the EP model and then for the SP model. 


a. FP model without baseline covariates 

Full sample analysis . The ATE parameter for the clustered, blocked design is 


^ ^b/^clus,b,FP 
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and = l^.ft(0)/n^.^) are schooUevel mean potential 

outcomes in the treatment and control conditions, respectively. This ATE parameter is a weighted 
average of the block-specific ATE parameters, which, in turn, are weighted averages of school-specific 
ATE parameters within the blocks. 

Similar to Design 3, the default weights in RCT-YES for Design 4 are = (1/ - 1 , and 

Wb=mjj, so that schools are weighted equally within blocks and blocks are weighted by their 
numbers of study schools. Another weighting scheme for the EP model is to weight blocks equally 
(yVijb = . W/fe = (1/ , and = 1 ) which can be implemented in RCT-YES by including 

a weight variable in the input data file. Another option is to weight students equally ( = 1 , 
Wjb^Hjb, and Wb=Hb). 

In a blocked design, random assignment is conducted separately within each block. Thus, ATE 
estimators discussed in Chapter 7 for Design 3 apply for each block separately. Accordingly, within 

each block, the simple differences-in-means estimator in (7.4), fidus b fp ~ iynw ~ ycbw) ' ^ 

consistent estimator for the block-specific ATE parameter in (7.8). Accordingly, a consistent 
estimator for the pooled ATE estimator for Design 4 is 

h ^ 

^ S '^b/^clus,b,FP 

( 8 . 2 ) , blocked ,FP h 

ll^b 

b=l 

Because the samples across blocks are independent, a consistent variance estimator for 

P clus , blocked ,FP 

(8.3) AsyVarpiP^^^^ 

, blocked, FP^ h ’ 

b=\ 

where AsyVarAP^i^^b pp) calculated using (7.12) and (7.13) that is applied to each block 

separately. Eurthermore, Mocked fp asymptotically normal as the number of schools per block 

goes to infinity (which may not be realistic in some settings), because it is a weighted sum of 
independent, asymptotically normal random variables. Thus, t-tests can be used for hypothesis 

testing with (T^. ) - 2h) degrees of freedom, where h is the number of blocks. 
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If the BLOCK_FE=l option is specified, RCT-YES estimates the following regression model that 
includes block indicator variables but excludes block-by-treatment status interaction terms: 


(8.4) Yy + 

h=\ 


h 

y 


Jb ' ^jb- 


Using methods from Chapter 6 for Design 2, the weighted least square estimator for di in (8.4) can 
be shown to be a weighted average of block-specific impacts with weights [(1 / + (1 / * > 

where ! ^b) ^he average school-level weight in block b. This approach is similar 

to weighting blocks by the inverses of their variances. In general, this estimator is biased for the FP 
ATE parameter. Nonetheless, it could be a parsimonious specification for models with small 
numbers of schools per block. 


Using results from Chapter 6 for Design 2, can be shown to be asymptotically normal with 
asymptotic variance that can be estimated as follows: 


h 


(8.5) Asy Vavj^ (d ^ ) 


1 


ILlL^jby^jb -Pb)\yjb-^iiTjb-Pb)-3bY 

b=\ y=i 


m{m — h — l) 


h 

iT.^hPbi^-PbW 

b=l 


where — (w^ / m) is the proportion of all schools in block b . For this specification, RCT-YES 
conducts hypothesis testing using t-tests with (^^_^ (?^ri + ) “^ “ ^) degrees of freedom. 

To assess baseline equivalence, if BLOCK_FE=0, RCT-YES conducts t-tests using (7.17) where the 
treatment-control covariate differences and pooled variances are calculated for each block separately 
and weighted to yield overall values. If BLOCK_FE=l, the program uses the variance estimator in 
(8.5). The joint test of baseline equivalence across all covariates is conducted using Hotelling’s T- 
squared statistic in (7.17a) for Design 3. 

Subgroup analysis . RCT-YES estimates impacts for subgroups using similar methods as for the full 
sample. By default, the program calculates the ATE and variance estimators in (7.15) and (7.16) 
separately for each subgroup and block, and then averages the block-specific estimators to obtain 
pooled subgroup estimators. If the BLOCK_FE=l option is used for the subgroup analysis, RCT- 
YES estimates the following regression model using OLS: 

( 8 - 6 ) • 

g=l b=\ g=l 
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h 

where ^ jbPgb) ^ i error term. The variance estimator for is: 

h=\ 


"gb 


( 8 . 7 ) AsyVarg{P) = 


Z Z ^ibiTjb-Pgbfiyjgb-Pb(rjb-Pgb)-Sgbf 

1 b=\ j-.Gjg=\ 


m (m -h-\) 


^H^ShPgbi^- Pgb)(llbf 

b=\ 


where = (Z™S.,=i^jg6 ! ^gb) ^ Pgb = i^Tgb I ^gb) > and = {nig^ I nig). For this estimator, 
the degrees of freedom for hypothesis testing is (Zs-i^^rg* ^^cgb)~^~h) ■ 


b. FP model with baseline covariates 

Full sample analysis . To estimate regression estimators for Design 4, by default, RCT-YES estimates 
the following regression model using weighted least squares, where centered baseline covariates, 
^jb ~ “ ^jbw )’ are included as explanatory variables in the model with associated parameter 

vector y: 


(8-8) Y/ - Z Pclus,b,FP^ jb^jb 
b=\ 


+ Y^^b^jb+'^j 
b=l 


Using results and methods from previous chapters (see, for example, Lemmas 6.1 and 7.2), the 
weighted least squares estimator, PciusbMRFP' ‘^an be shown to be consistent and asymptotically 

normal for each block, and the asymptotic variance of ^ pp can be estimated as follows: 


MSB. 


(8.9) AsyVarg{Pcius,h,MR,Fp) — —2 "*■ —2 


ChW 


w^mp^q, Wam{l-p,)q, mq^ w, 


1 Y 


Tb 


W, 


Cb 


where 


MSE. 


ThW 


{m-' 


1 * /V /V ~ 

T 7 Z ^jb jb ~ Pb )Pclus,h,MR,FP ~ ^ jbY ) 

^)Pb(lb ^ _/:5,.i=l,7’,=l 






and qi, = {nif^ / m) 
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The block-specific ATE and variance estimators can then be inserted into (8.2) and (8.3) to obtain 
pooled estimators across all blocks. RCT-YES conducts hypothesis testing for the pooled estimator 

using t'tests with + ’^ct) “ ^ “ 2/z) degrees of freedom. 

If the BLOCK_FE=l option is specified, RCT-YES estimates the regression model in (8.4) where the 
explanatory variables include the centered baseline covariates. The program uses the following 
variance estimator for the resulting ATE estimator, = 


h nth 


(8.10) AsyVarj^id,^) = 


, Z Z - Pb f (yjb - «i,M« (Pjb -Pb)-Sb~ X jbY)' 

1 b=\ j=\ 




[^^bPb(^- Pb)^bf 


6=1 


The degrees of freedom for hypothesis testing is (Zi-i + ^cb) ~ v -/? - 1) . 

Subgroup analysis . For subgroup analyses with covariates, RCT-YES estimates the following model 
where the centered covariates are included as additional model regressors: 

6=1 g=l 6=1 g=l 


where = (Tj^ ~ Pgb) T] ^ is the error term. In this formulation, 


clus,g,b,MR,FP 


is the 


ATE parameter for subgroup g in block b . The variance estimator for is 


(8.12) Asy Varj^ (Pgb,MR ) = 


TgbW _ 1 JySE^_yMSE CgbW Y 


MSE, 


^Igb^Tgb ^Cgb^Cgb ^gb ^Tgb 


W, 


Cgb 


where 


MSE. 


TgbW 




ntgb 

T 21 




=i,r,.=i 


^Ibiyjgb- Pgb,MR^^~ Pgb)-^gb~^\^xy^) 


MSE. 


CgbW 


7 \ n \ f Z "^jgb (Y jgb + PgbMPPgb ~ ^gb ~ Xjgb Y) > 

- vK (1 - )g^, - 1 ,:G .,=l.t^=lT,=0 


- (^j,G.^=iy.^=x^jgbl^Tgb) - and W^gb - (Z 7 :G%i;r,.,= 0 ^A* ^"^Cg6) • 


Tg6 
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The regression-adjusted block-specific impact estimates and variances can then be weighted to yield 
overall ATE estimates for each subgroup. RCT-YES conducts hypothesis testing for the pooled 

subgroup estimator using t-tests with ) - vq^ - 2h) degrees of freedom, where 

qg 


Finally, if the BLOCK_FE=l option is specified for the subgroup analysis with covariates, RCT-YES 
estimates (8.6) using the centered covariates and calculates the following variance estimator: 


h "V> 


(8.13) AsyVarXp^j^g) 


Z Z < -PJ (yjgt - Kmr - T,, ) - - hJf 

b=\ 7:G, =1 


{m^ —h-v-\) 




For this estimator, the degrees of freedom for hypothesis testing is (Z. X^T„h y^cgb)~^qg ~h~V). 


c. SP model without baseline covariates 

Similar to Designs 2 and 3, there are several ATE parameters for the SP specification for Design 4 
that depend on sampling assumptions regarding study blocks, schools, and students from broader 
populations. To keep the number of possibilities manageable, we typically invoke the same sampling 
assumptions for schools and students. 

By default, RCT-YES estimates the PATE parameter that assumes random sampling at each level. If 
the CATE_UATE option is set to 1, RCT-YES estimates the CATE parameter that assumes fixed 
blocks but a random sample of schools and students within blocks. If the CATE_UATE option is 
set to 2, RCT-YES estimates the DATE parameter that assumes a random sample of blocks but fixed 
study schools and students within blocks. 

RCT-YES uses the same methods for estimating impacts for all three SP parameters by: (1) estimating 
weighted simple differences-in-means estimators in each block and (2) calculating a weighted average 
of the block-specific estimators to obtain pooled estimators. Variance estimation, however, differs 
somewhat for the three SP parameters. In what follows, we first discuss estimators for the CATE 
parameter and then for the PATE and DATE parameters. 

The CATE parameter for Design 4 is 


(8.14) 


clus Mocked ,CA TE 


= I 


W, 


k>=l 


Esi^jb) 


/I 

b=\ 
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where - Ej (1)) and = Ej (0)) are mean potential outcomes in the student super- 

population within schools. This parameter is a weighted average of the PATE parameter from Design 
3 across the fixed blocks. Accordingly, we could use the variance estimator in (7.30) for the Design 
3 PATE parameter that is applied separately to each block, and these estimators could then be 
averaged to calculate the pooled variance estimator using (8.3). RCT-YES, however, instead uses the 
simpler variance estimator for the EP model for Design 4 (excluding the EP heterogeneity terms) 
under the assumption that students in the sample are fixed for the study. This same approach is 
used for the subgroup analysis and the baseline equivalency analysis. 

If the BEOCK_EE=l option is specified for the CATE parameter, RCT-YES estimates the model in 
(8.4) and uses the variance estimator in (8.5) (and similarly for the baseline equivalency analysis). 
Eor both the BEOCK_EE=0 and 1 specifications, RCT-YES conducts the joint test of baseline 
equivalency using the covariances in (7.17a) for Design 3. 

The PATE parameter for Design 4 can be obtained from the CATE parameter in (8.14) by averaging 
over the sampling of blocks: blocked , pate = ^siPcius, blocked, cate) ■ The asymptotic variance of the 

simple weighted differences-in-means estimator for the PATE parameter (as the number of blocks 
gets large) can be obtained using similar methods as for Eemma 6.2 (the PATE parameter for Design 
2) and Eemma 7.3 (the PATE parameter for Design 3) and can be expressed as follows: 


(8.15) AsyVar,^^{!3^,^^ 

,blocked ,PATE ) = Varj^[w^ 

+ Eb[ 


[ ^jb iMrib Ecjb ) ] 


'\ + Ej,[wlVar^ 


w. 




2 


z 

7=1 


Esi^jb) 

^]b\(\-Pb)(^ljb+Pb(^qb'\ 


^jbiPTJb -Pcjb) 
Esi^jb) 


] 


y-b 


I 


where cr^^ = (Pjyj (1)) and cr^* = Tar^(l(yj(0)) are variances of potential outcomes from /. In 

this expression, the first term is the variance of ATEs across blocks, the second term is the variance 
of ATEs across schools within blocks, and the third term is the variance of ATEs within schools. 
This variance structure aligns with the three-stage sampling assumption for the PATE parameter. 


A consistent estimator for the asymptotic variance in (8.15) is 

1 ^ ' 

( 8 . 16 ) AsyVarjBSBiPcius ,blocked ,PATE )^ n -l HWclus ,b,PATE 

{h - \)hw 


-wp^ 


clus , blocked , PATE j 


This variance estimator represents the variation of estimated ATEs across blocks, and can be proved 
using the same methods as for Eemma 6.2. It is interesting that this is the same variance estimator 
as for the PATE parameter for Design 2 with student-level random assignment. This occurs because 
the assumed primary sampling unit for the PATE parameter is the block for both the clustered and 
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non-clustered designs, and the variance across blocks captures the variances of lower-level sampling 
units. RCT-YES conducts hypothesis testing for this specification using t-tests with (h - 1) degrees of 
freedom. The program also uses (8.16) for the baseline equivalency analysis and a version of (6.25a) 
adapted to Design 4 for the joint test. 

The DATE parameter for Design 4 can be obtained from the FP parameter in (8.1) by averaging 
over the sampling of blocks: f^dus, blocked , uate = ^sif^ctus, blocked, fp)- Using (8.15), the asymptotic 
variance of the simple weighted differences-in-means estimator for the DATE parameter is 

^5[w,,(r,,(i)-7^,(0))] 1 
Esiwjb) 

RCT-YES estimates this variance using the PATE variance estimator in (8.16). 

d. SP model with baseline covariates 

To estimate regression estimators for the SP model for Design 4, RCT-YES adapts the regression 
estimators from previous models. For the CATE parameter in (8.14), the program uses the same 
approach as for the FP model for Design 4 (see (8.8) and (8.9)). For the PATE and DATE parameters, 
RCT-YES uses the estimation approach discussed in Section 6d for the PATE parameter for Design 
2 (see (6.27) to (6.30)). 

e. Matched pair designs 

For Design 4, matched pair designs occur if similar schools are paired using observable baseline 
measures and one school in a pair is then randomly assigned to the treatment group and the other 
to the control group. The pairing is done separately within each block (for example, each school 
district). These designs are common for clustered designs in education research when there are small 
numbers of schools per block, because they can help avoid the possibility of a “bad draw” where the 
treatment and control groups differ along important dimensions due to chance. In clustered designs, 
a critical matching variable is the cluster size to help minimize bias of the impact estimates (Imai, 
King, and Nall, 2009; Imbens, 2011). 

The differences-in-means estimators for Design 4 produce consistent estimates for matched pair 
designs. However, as discussed in Chapter 6, Section e, without further assumptions, variance 
estimators for the FP model are not identifiable because each pair contains only 1 treatment and 1 
control group school. To address this issue, RCT-YES estimates variances for the matched pair design 
using the SP variance estimator for the Design 4 PATE parameter (see Section 8d above). 


( 8 . 17 ) Var,[w^ 
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f. The CAGE parameter 

For Design 4, RCT-YES estimates the CAGE parameter (if requested) using the same general 
methods as for Design 3. The program obtains CAGE estimates by dividing the estimated ATEs for 

the outcomes ( ) by the estimated ATEs for the service receipt variables ( ). For the default FP 

model, the GAGE estimates are obtained separately by block and then weighted to obtain pooled 
GAGE estimates. To calculate standard errors of the GAGE estimates, the program uses (5.57) where 

the variance terms are calculated using the pertinent Design 4 variance formulas for P and . 
The covariance term in (5.57) for the default FP model with BLOGK_FE=0 is calculated using 

h ^ 

Z wlAsyCov^ iPciusXFP ’ PcL ) 

(8.18) AsyCovPP^,^^ 

,blocked ,FP ’ Pcl) = — ~h . 

b=l 


where AsyCov^P^,^^^ fp^Pcl) calculated using (5.58) applied to each block, and similarly for the 

GATE model. If the BLOGK_EE=l option is specified for the EP or GATE models, RCTYES ignores 
the covariance terms in the calculations. Eor the PATE and DATE models, RCT-YES calculates the 
covariance terms using 


( 8 . 19 ) 


AsyCoVjj^sB 


(A 


cius Mocked , PATE ’ 


Pcl) 


1 

(h-V)hw^ 


h 

b,PATE 

b=\ 


wp^ 


clus Mocked ,PATE 


)(WhPb. 


CL 


wPcl\ 


where ^:l ^he estimated ATE on the service receipt rate in block b . 


Einally, note that to estimate the CAGE parameter for the default EP model, RCT-YES excludes 
blocks where there is no clusterdevel variation in either the outcome variable or the service receipt 
variable for treatment or control clusters. Thus, many blocks could be excluded, for example, if the 
overall service receipt rate is very large for treatment schools and very small for control schools. In 
this case, users may want to consider the BEOCK_EE = 1 option for the EP model. 
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9. Simulation analysis 

This chapter presents results from a simulation analysis to examine the statistical performance of the 
design-based estimator and other commonly-used RCT estimators that all rely on asymptotic 
approximations. The simulations are conducted for a clustered RCT design rather than a non- 
clustered design because finite sample biases are likely to be more prevalent for clustered designs 
where the variance estimators are driven primarily by the number of clusters rather than the number 
of individuals. We focus on three estimators: (1) a design-based estimator that is estimated using the 
methods discussed in Chapters 7 and 8 and the variance formulas in (7.32) and (8.16); (2) an HLM 
maximum likelihood estimator that is estimated using SAS Proc Mixed with student-level data, and 
(3) an OLS robust cluster standard error (RCSE) “sandwich” estimator that is estimated using SAS 
Proc Genmod with student-level data (see Huber, 1967; White, 1980, Liang and Zeger, 1986; and 
Higgle, Liang, and Zeger, 1994). Lor the RCSL estimator, we do not apply the small sample bias 
corrections found in the literature. 

Our focus is on a clustered design where schools are randomized (Design 3). However, we also 
conduct simulations for the design-based SP estimator where schools are randomized separately 
within randomly sampled districts (Design 4), because standard errors for this estimator have a 
different structure than the other standard error estimators considered in this report, and thus, it is 
important to assess its performance. 

a. Simulation methods 

Methods for Design 3 . Lor Design 3, we conducted the simulations by randomizing schools to a 
single treatment or control condition and generating student test score outcome data {Posttest ^j ) . 

Regression models were used to estimate ATEs where pretest scores {Pretest ^j) were included as 

model covariates to increase the precision of the estimates. The underlying pretest-posttest RCT 
model used to generate the simulated data for student i in school j was as follows: 

(9. la) Posttestg = PTj + y Pretest + {Uj + 6jTj + e^) 

(9. 1^) Pretest-j = 100 + {u^. + e,,,..), 

where P is the ATE parameter, Uj are independent and identically distributed (iid) random school- 
level errors in the posttest model with mean 0 and variance ; Mgy are iid (0, school-level 
errors in the pretest model; Oj are iid (0, (Jg) random errors that capture the heterogeneity of 
treatment effects across schools; e^j are iid (0, cr^ ) student-level errors in the posttest model; eo, are 
iid (0, ) student-level errors in the pretest model; and errors across levels and equations are 
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assumed to be independent. Because of 9j, the variances of posttest scores are larger for the 
treatment group than control group. 

For the simulations, we made the following real-world model parameter assumptions: (1) 60 percent 
of schools are randomized to the treatment group and 40 percent to the control group {p = .6) ; (2) 
the standard deviations of pretest and posttest scores (cr^core^) about 15; (3) the ATE 

parameter is .20 standard deviations so that /? = 3 ; (4) the squared correlation (/7^) between pretest 
and posttest scores is .5 at both the school and student levels, which implies that y — = .71; (5) 

the intraclass correlation, ICC - (7^ /(cr^ -I- cr^ ) , is .10 for posttest scores (where we ignore (jj ) and 
similarly for pretest scores; and (6) - f cl for the treatment group, where / = .10 . Using these 

assumptions, we calculated the model error variances using the following relations: 
<-<rJCC, alii -ICC) /ICC, al ^ al{\- ICC)/ ICC , t7,^=/tT^and 

.9 2) ^2_ /CC[r7L.g-P^)->gV(l-;^)] 

l + (/;,)/CC 

Finally, to generate unbalanced designs, student sample sizes were allowed to vary across schools and 
to be positively correlated with posttest scores. Specifically, we drew the student sample size from a 
(Jni/orm(10,40) distribution if Uj > 0 and from a Uniform(5 ,20} distribution if Uj < 0 (rounded to 

the nearest integer), yielding a correlation coefficient of about .15-.20 between school size and 
student posttest scores. 

Separate simulations, with 10,000 replications each, were conducted assuming total samples of 8, 
12, 16, 20, 40, and 60 schools (statistical precision in clustered designs is usually primarily driven by 
the number of clusters rather than the number of individuals per cluster). Separate simulations were 
conducted assuming that , 6j, e^, and had (1) normal distributions to align with the 
HLM assumptions; (2) bimodal normal distributions for the school-level errors where 
My ~ A((T„ , (7^ / 2) with probability .5 and My ~ A(-<7j, , <7^ / 2) with probability .5 and normal 

distributions for the student-level errors with e,y ~ N(0,(jI 12) (and similarly for the pretest model); 

and (3) mean-zero chi-squared distributions for all errors. We allowed for bimodal and chi-squared 
distributions to allow for skewness and some misspecification in the HLM framework. In addition, 
to allow for additional model misspecification, we conducted simulations where the regression 
model was estimated controlling for the natural logarithm of the pretest scores rather than the linear 
pretest scores that were used to generate the data. 

To examine the statistical properties of the considered estimators, we calculated finite sample biases 
of the estimated ATEs and their standard errors (we do not consider statistical power). For this 
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analysis, we stored the 10,000 replicated values of fi and their estimated standard errors for each 
specification. To examine biases in the estimated ATEs, we calculated average values of the 10,000 
(i estimates and compared them to the true value of /? = 3. To examine biases in the estimated 
standard errors, we compared average empirical values of the standard errors produced by the 
estimators to their “true” sampling variability as measured by the standard deviations of the fi 
estimates. Finally, we conducted simulations under the null hypothesis of no average treatment effect 
(/? = 0) and calculated the proportion of nstatistics that were statistically significant across the 
10,000 replications to examine nominal Type 1 error levels (using a 5 percent significance level and 
a twO'tailed test). 

Methods for Design 4 . The design-based standard error estimator for the Design 4 SP model has a 
different structure than the standard error estimators for the other designs considered in this report. 
Thus, to assess the performance of this estimator, we conducted simulations where data were 
generated assuming the randomization of schools within randomly sampled study sites. The 
simulations were conducted using the following specifications: (1) 4, 8, 12, 16, 20, or 40 study sites; 
(2) the number of schools per site are drawn from a Dni/orm(4,6) distribution if 7/^ > 0 and from a 
Dni/orm(5,8) distribution if 7/* < 0 (rounded to the nearest integer), where are site-level errors 

defined below; (3) half the schools are randomized to the treatment group; (4) student sample sizes 
within study schools are generated using the same approach as described above for Design 3; and (5) 
model errors are assumed to have normal, bimodal, and mean-zero chi-squared distributions as 
described above for Design 3. Standard errors for the simulations were calculated using (8.16) and 
(6.28). 

We used the following underlying Design 4 model to generate the simulated data for school j in 
site b that builds on the model for Design 3: 

{93a) Posttest + /Pretest j, + {% + + u], + OjJj, + e).,) 

(9.3^) Pretest =100 + + e^.^ ), 

where 7/^ are iid (0, ) site-level errors in the posttest model, are iid (0, cr] ) errors that represent 

the variation in ATEs across sites, u*^ are iid (0, (7^*) school-level errors, ^ 

average student-level errors, 9j^ are defined as above for the Design 3 simulation model, and 
similarly for the errors in the pretest model. In this specification, the key variance component for 
the design-based estimator is crj . For the simulations, values for p , , and y were set to the same 

values as for the Model 3 simulations. Eetting (5^^ = 7/^ +77^^ , we obtained values for cr^ using (9.2) 
where we replaced cr^ with erl = CJ^ + and ICC = . 10 with ICCg — !{a\ + crj ) = .20 . We 
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calculated the other variance components using the relations cl — ICCg) ! ICCg , 

(j^ = .50(Jg , (jI = . 5 OfT^ , CTy, = crj - cr,^ , and Cg - . I Ocr^* . The same approach was used to calculate 
the pretest variances in (9.3b). 

For Design 4, we conducted simulations for models that included and excluded the pretest scores. 
For the simulations without the pretests, we generated data using (9.3a) by setting / = 0 and using 
the same procedures as described above for the model with the pretests to obtain values for the other 
model parameters. 

b. Simulation results 

Table 5 displays simulation results for Design 3. The results indicate that biases of the estimated 
ATEs for the design-based and HLM estimators are very small even if the sample contains only 8 
schools, but that the RCSE estimator is slightly biased upwards in small samples. To assess standard 
error biases. Table 6 displays “true” standard errors for the considered estimators in Columns 2 to 
4 and empirical standard errors in Columns 5 to 7; Table 7 displays associated nominal Type 1 error 
rates. 

The two main findings from Tables 6 and 7 can be summarized as follows: 

• If the number of schools is at least 12, the true standard errors of the three estimators are 
similar (Table 6). With only 8 schools, the true standard errors of the design-based estimator 
are slightly larger than for the HLM and RCSE estimators with normally distributed 
disturbances, but not if the model disturbances have bimodal normal distributions. For all 
specifications, however, differences in the true precision of the estimators disappear if the 
sample contains at least 12 schools. 

• If the number of schools is at least 12, the empirical standard errors of the three estimators 
align with their true standard errors. With smaller samples, the empirical standard errors 
of the design-based estimator are downwardly biased (Table 6) and Type 1 errors are inflated 
(Table 7). However, these biases become negligible with more clusters. The downward biases 
are more pronounced for the RCSE estimator, even with large cluster samples (a result also 
found in Angrist and Pischke, 2009 and Green and Vavreck, 2008). Importantly, we find 
that the design-based and RCSE findings are very similar if the design-based approach 
weights schools by their student sample sizes (not shown). The HLM estimator tends to 
perform well across the considered specifications. 

Table 8 displays simulation results for the design-based estimator for the Design 4 SP model without 
pretests, and Table 9 presents corresponding results for the model that includes the pretests. We 
find that biases of the ATE estimators are very small for both specifications. Furthermore, the “true” 
standard errors align with the empirical ones if the sample contains at least 12 study sites. The 
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pattern of results is similar if the number of schools per site is varied, including a matched pair 
design with 1 treatment and 1 control group school per site (not shown). 

The simulation findings suggest that the design-based ATE estimator performs well for clustered 
RCTs. Biases of the estimated ATEs are negligible if the sample contains at least 8 schools. 
Eurthermore, with a sample of at least 12 schools, the empirical standard errors produced by the 
design-based approach align with the true standard errors, and are comparable to those for the HEM 
and RCSE estimators. Thus, the design-based approach— which is fully based on the random 
assignment mechanism and simple asymptotic variance approximations— is likely to perform well 
under a range of RCT settings. 
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Table 5. Simulation results for Design 3: average of estimated ATEs across replications 

1 Average of estimated ATEs across replications 

1 Total clusters 
1 (treatment, control) 

Design-based 

estimator 

HIM 

estimator 

ROSE 

estimator 


Normal distributions for the error terms 

8 (5,3) 

3.02 

3.03 

3.11 


12 (8,4) 

3.01 

3.04 

3.11 


16 (10,6) 

3.01 

3.03 

3.08 


20 (12,8) 

3.02 

3.03 

3.06 


40 (24,16) 

2.98 

2.98 

3.00 


60 (36,24) 

2.99 

2.99 

3.00 


Bimodal normal distributions 

8 (5,3) 

2.94 

2.98 

3.10 


12 (8,4) 

2.99 

2.99 

3.13 


16 (10,6) 

3.02 

3.02 

3.08 


20 (12,8) 

2.96 

2.97 

3.00 


40 (24,16) 

3.01 

3.02 

3.03 


60 (36,24) 

2.99 

2.99 

3.01 


Chi-squared distributions (with zero means) 

8 (5,3) 

2.99 

3.01 

3.10 


12 (8,4) 

2.97 

2.99 

3.10 


16 (10,6) 

2.99 

3.01 

3.05 


20 (12,8) 

2.99 

3.00 

3.05 


40 (24,16) 

3.01 

3.01 

3.02 


60 (36,24) 

2.99 

3.00 

3.01 


Chi-squared distributions (with zero means) where the estimation model includes the natural log of pretest scores 

8 (5,3) 

3.00 

3.02 

3.12 


12 (8,4) 

2.98 

3.02 

3.12 


16 (10,6) 

3.03 

3.04 

3.10 


20 (12,8) 

3.00 

3.00 

3.03 


40 (24,16) 

2.99 

2.99 

3.00 


60 (36,24) 

2.99 

2.99 

3.00 



Notes: The figures are averages of estimated ATEs across 10,000 repiications for each estimator and specification. See the text for details 
on the calculations. 

ROSE = robust cluster standard error 
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9. Simulation analysis 


Table 6. Simulation results for Design 3: standard error estimates across replications 


Total clusters 

(treatment, 

control) 

Standard deviation of estimated ATEs 
across replications 
(“true” standard errors) 

Average of estimated standard errors 
across replications^ 

Design- 

based 

estimator 

HLM 

estimator 

RCSE 

estimator 

Design- 

based 

estimator 

HLM 

estimator 

RCSE 

estimator 

Normal distributions for the error terms 

8 (5,3) 

3.50 

3.10 

3.16 

2.93 (1.08) 

2.97 (0.88) 

2.29 (0.80) 

12 (8,4) 

2.78 

2.57 

2.61 

2.51(0.74) 

2.52 (0.59) 

2.09 (0.63) 

16 (10,6) 

2.29 

2.16 

2.18 

2.15 (0.48) 

2.13 (0.43) 

1.88 (0.46) 

20 (12,8) 

2.00 

1.89 

1.91 

1.92 (0.36) 

1.89 (0.34) 

1.72 (0.36) 

40 (24,16) 

1.40 

1.35 

1.35 

1.37 (0.18) 

1.35 (0.17) 

1.27 (0.19) 

60 (36,24) 

1.14 

1.10 

1.09 

1.12 (0.11) 

1.10 (0.11) 

1.06 (0.13) 

BImodal normal distributions 

8 (5,3) 

3.12 

3.22 

3.18 

2.61 (0.95) 

3.14 (0.86) 

2.31 (0.74) 

12 (8,4) 

2.48 

2.73 

2.65 

2.22 (0.65) 

2.65 (0.55) 

2.09 (0.58) 

16 (10,6) 

2.02 

2.26 

2.18 

1.89 (0.42) 

2.24 (0.39) 

1.88 (0.43) 

20 (12,8) 

1.78 

2.00 

1.91 

1.68 (0,31) 

1.99 (0.30) 

1.72 (0.33) 

40 (24,16) 

1.22 

1.41 

1.34 

1.20 (0.15) 

1.41 (0.15) 

1.26 (0.17) 

60 (36,24) 

0.99 

1.14 

1.07 

0.98 (0.10) 

1.16 (0.10) 

1.05 (0.12) 

Chl-squared distributions (with zero means) 

8 (5,3) 

3.69 

3.33 

3.54 

3.09 (1.18) 

3.18 (0.99) 

2.49 (0.94) 

12 (8,4) 

2.91 

2.76 

2.95 

2.63 (0.81) 

2.70(0.68) 

2.29 (0.75) 

16 (10,6) 

2.39 

2.31 

2.47 

2.25 (0.54) 

2.29 (0.50) 

2.09 (0.60) 

20 (12,8) 

2.12 

2.07 

2.20 

2.01 (0,40) 

2.03 (0.39) 

1.93 (0.49) 

40 (24,16) 

1.46 

1.45 

1.55 

1.44 (0.20) 

1.46 (0.20) 

1.46 (0.28) 

60 (36,24) 

1.19 

1.19 

1.27 

1.18 (0.13) 

1.20 (0.13) 

1.22 (0.19) 

Chl-squared distributions (with zero means) where the estimation model Includes the natural log of pretest scores 

8 (5,3) 

3.69 

3.32 

3.54 

3.09 (1.16) 

3.19 (0.99) 

2.50 (0.93) 

12 (8,4) 

2.94 

2.80 

2.99 

2.64 (0.81) 

2.72 (0.68) 

2.30 (0.75) 

16 (10,6) 

2.40 

2.30 

2.47 

2.25 (0.53) 

2.29 (0.50) 

2.09 (0.59) 

20 (12,8) 

2.12 

2.06 

2.19 

2.01 (0.41) 

2.04 (0.40) 

1.94 (0.50) 

40 (24,16) 

1.46 

1.45 

1.54 

1.43 (0.20) 

1.45 (0.20) 

1.46 (0.28) 

60 (36,24) 

1.18 

1.18 

1.26 

1.17 (0.13) 

1.19 (0.13) 

1.21(0.19) 


Notes: The figures are based on 10,000 replications for each estimator and specification. See the text for details on the calculations. 
ROSE = robust cluster standard error 
3 Standard deviations are shown in parentheses. 
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9. Simulation analysis 


Table 7. Simulation results for Design 3: Type 1 errors across replications 


Percentage of t-statistics that are statisticaiiy significant 

Totai ciusters 
(treatment, controi) 

Design-based 

estimator 

HIM 

estimator 

ROSE 

estimator 


Norma i distributions for the error terms 

8 (5,3) 

.077 

.038 

.106 


12 (8,4) 

.070 

.046 

.099 


16 (10,6) 

.062 

.046 

.083 


20 (12,8) 

.057 

.044 

.074 


40 (24,16) 

.055 

.047 

.063 


60 (36,24) 

.054 

.053 

.060 


Bimodai normai distributions 

8 (5,3) 

.081 

.043 

.105 


12 (8,4) 

.075 

.050 

.108 


16 (10,6) 

.064 

.049 

.081 


20 (12,8) 

.061 

.049 

.070 


40 (24,16) 

.052 

.048 

.062 


60 (36,24) 

.050 

.046 

.053 


Chi-squared distributions (with zero means) 

8 (5,3) 

.080 

.036 

.107 


12 (8,4) 

.071 

.042 

.104 


16 (10,6) 

.057 

.046 

.088 


20 (12,8) 

.056 

.048 

.082 


40 (24,16) 

.052 

.048 

.066 


60 (36,24) 

.053 

.048 

.060 


Chi-squared distributions (with zero means) where the estimation modei inciudes the naturai iog of pretest scores 

8 (5,3) 

.074 

.035 

.106 


12 (8,4) 

.073 

.046 

.108 


16 (10,6) 

.063 

.046 

.092 


20 (12,8) 

.055 

.045 

.076 


40 (24,16) 

.053 

.050 

.063 


60 (36,24) 

.048 

.046 

.057 



Notes: The figures are the percentages of t-statistics that are statistically significant across 10,000 replications for each estimator and 
specification. See the text for details on the calculations. 

ROSE = robust cluster standard error 
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9. Simulation analysis 


Table 8. Simulation results for Design 4 for the design-based SP estimator without covariates 



Average of 

Standard deviation 

Average of estimated 

Total sites 

estimated ATEs 

of estimated ATES 

standard errors^ 

Normal distributions for the error terms 

4 

3.03 

3.02 

2.78(1.19) 

8 

2.99 

2.14 

2.06 (0.57) 

12 

2.98 

1.72 

1.71(0.38) 

16 

3.01 

1.51 

1.48 (0.28) 

20 

3.00 

1.34 

1.34 (0.23) 

40 

3.01 

0.94 

0.95 (0.11) 

Bimodal normal distributions 

4 

3.02 

3.33 

3.05 (1.29) 

8 

3.01 

2.32 

2.26 (0.61) 

12 

2.98 

1.90 

1.87 (0.40) 

16 

3.00 

1.65 

1.63 (0.30) 

20 

3.01 

1.47 

1.46 (0.24) 

40 

2.99 

1.03 

1.04 (0.12) 

Chi-squared distributions (with zero means) 

4 

2.98 

3.17 

2.94(1.30) 

8 

3.01 

2.24 

2.19 (0.64) 

12 

3.02 

1.85 

1.80 (0.42) 

16 

2.99 

1.62 

1.57 (0.31) 

20 

2.99 

1.44 

1.41 (0.25) 

40 

3.03 

1.02 

1.01 (0.12) 


Notes: The figures are based on 10,000 replications for each estimator and specification. See the text for details on the calculations. 
“ Standard deviations are shown in parentheses. 
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9. Simulation analysis 


Table 9. Simulation results for Design 4 for the design-based SP estimator with covariates 



Average of 

Standard deviation 

Average of estimated 

Total sites 

estimated ATEs 

of estimated ATES 

standard errors^ 

Normal distributions for the error terms 

4 

3.02 

2.98 

1.62 (0.80) 

8 

2.98 

1.64 

1.38 (0.40) 

12 

2.99 

1.29 

1.17 (0.27) 

16 

2.99 

1.11 

1.03 (0.20) 

20 

2.99 

0.98 

0.94 (0.16) 

40 

2.99 

0.68 

0.67 (0.08) 

Bimodal normal distributions 

4 

2.98 

2.63 

1.54 (0.72) 

8 

2.97 

1.43 

1.26 (0.37) 

12 

3.02 

1.13 

1.08 (0.25) 

16 

2.99 

0.95 

0.94 (0.18) 

20 

2.99 

0.85 

0.85 (0.15) 

40 

3.00 

0.58 

0.61 (0.07) 

Chi-squared distributions (with zero means) 

4 

2.98 

3.19 

1.73 (0.90) 

8 

3.02 

1.79 

1.47 (0.46) 

12 

2.96 

1.39 

1.25 (0.31) 

16 

3.00 

1.19 

1.10 (0.23) 

20 

3.00 

1.05 

1.00 (0.19) 

40 

3.00 

0.73 

0.72 (0.10) 


Notes: The figures are based on 10,000 replications for each estimator and specification. See the text for details on the calculations. 
“ Standard deviations are shown in parentheses. 
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Appendix A: Mathematical proofs 


Appendix A: Mathematical proofs 


Proof of Lemma 5.1 

The argument in (5.3) in the main text proved that “Tc) = Pndus,FP - KcIus,fp 

unbiased. To calculate the moments of convenient to use the regression model in (5.4) to 

express as follows: 


(^.1) P, 


^(T.- p)[/3q + i.T.-p) + w . ] 


nclus,FP 


np{\-p) np{l-p) 

'^{T^-p)u, 


/^nclus,FP 


np{l-p) 


where the last equality holds because ^ (T.—p) = 0 and pY = np{\ — /?) . Substituting for 

i i 

using (5.4) and (5.4a) yields: 


(A.2) iPnctusFP nclus ,FP ) 


^[0,(7; -P)+T,ij, -pf] X7;[a, +(l-2p)r,] 


np{\-p) 


np(\-p) 


i=\ 

np{\-p) 


; /, =(1-;?)(1^(1)-7(1)) + ;7(}^(0)-7(0)) 


Using (A.2), the variance of pp is: 


Varp ( J l,T, ) p{\ - p){^ If - ^ ^ /,/,,) 

Par (6 ) = . '■=> - '=' 

^ ^'R\ynclus,FPJ 


[np{l-p)Y 


[npil-p)Y 


where the last equality holds because VarpifTf) — p{\^ p) and Cov^ (71, 71.) — —/>(! — />) /(/7 —1). Because 
= 0 , it follows that = 0, and thus, — . Hence, 
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(A3) 


V‘ 

V^^R^PncIus,Fp) ,, 

np{\-p){n-\) 


Z[(i - - ^(1)) + ^(^(0) - ^(0))]' 

/=1 


np{\-p){n-\) 


(\-p) 

np 



n(\- p) n 


Using — Sj + S^— 2S^(. and solving for yields the variance expression in (5.6), and the asymptotic 
variance expression in (5.8) follows directly from (5.7). 

The asymptotic normality of PndusFP follows by expressing (A.l) as 

4np{\-p){P„cius, FP nclus^FP )=z: j (T. — p)u- / yfn and using a central limit theorem for finite 

populations (see for example, Freedman 2006, Hogland 1978, and Hajek 1960). 

Proof of Lemma 5.2 

Using the notation from the proof of Lemma 5.1, the SP estimator expressed as 

n 

nclus^SP Y. T.y. / np{\ — p) — yj —yc' Substituting for y^ using (5.16) yields 
iP„cius,sp-Pncius,sp) = ^TfiJnp{l-p). Thus, = Pncius,sp because of (5.17), and the 

I-l 

variance expression in (5.19) is obtained using (5.18). Asymptotic normality follows by applying a 

n 

conventional central limit theorem to / \fn (see, for example, Rao 1973). 

i=l 

Proof of Lemma 5.3 

Let z, = (1 f. Xj) be centered model explanatory variables with the associated parameter vector 
(/^o Pnciu^ FP Y) • "bbe multiple regression estimator can then be expressed as 




n n 

1 

0 

0 

n 

-1 

n 

/=i 

n 

(AA) 

Pnclus,MR,FP 

=[(Zz,'Zir‘Zz;T,] = 

/=i /=i 

0 

p{^-p) 



/=i 


V . 


0 

^Tx. / n 

7=1 

^x!Xi / n 

i=\ 


_ 7=1 
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Because of random assignment, '^f.x^/n — - — >0, where — - — > denotes convergence in probability. 

1=1 

Thus, i^z[z .)/ n converges to a block diagonal matrix as n approaches infinity, and we find that 

/=i 

(^A.Sa) A Z U ) / « = ^^(1) + (1 - - 


(A.5b) J3, 


1 


nclus ,MR,FP 




1 


np{\-p) 


np{l-p) i 


nclus, FP’ 


(A.5c) 7 = (Z x.'Xi / «)■* Z (x[ (Po + Pncius,FpTi + M, )) / n = 7 


Thus, (A.5b) proves that is consistent. 


To obtain the asymptotic distribution of pp , we apply a standard asymptotic expansion to (A.5b): 

1 ” 

(A.6) nclus, MR, FP ~ P nclus, Fp) ~ “/= ~ “ Z ~ Po ~ P nclus, FP^i “*17) + ^ p^X) ’ 

Anp{\-p) ,=1 

where 0^(1) signifies a term that converges in probability to zero. Using the relation 

yi=px. (1) + (\-T.)Y. (0) and the definitions for A - P„cIus,fp - and y in (A12a) to (A. 12c), we can express 
the right hand side of (A.6) as follows 


(T.7) 


yfnp{l-p) 'n 


Z^(7:(r(i)-i"(i))+(i-7:)(r(0)-r(0))-x,n;tn,„)+o/i). 


Using definitions for OC- and T- from (5.4a), this expression can be further simplified as 


(A8) 


— - X T, [«, + (1 - 2p)r^ - ] ■ 

Anp(l-p) i=i 


-oM). 


The term inside the brackets in (A.8) sums to zero. Thus, if we define /,■ as the bracketed term in (A.8), then 
Z^i = 0, and we can use the same methods as for Lemma 5.1 to derive the asymptotic variance of 


P, 


nclus, MR, FP * 


127 



Appendix A: Mathematical proofs 


VaVj^ {Pnclus,MR,FP ) 


1 




f=l 


np{\- p) n-l 


- + o(l/n) 


f O 2 


S" si Sl\ 




np n{\-p) n 


xa XX xa 


np{\-p) 


- 2 (\- 2 p)- 


O ' Q 

XT XX xa 

np(l-p) 


Asymptotic normality follows from a standard central limit theorem. 


Proof of Lemma 5.4 

The main text provides an outline of the proof using the law of iterated expectations and the law of total 
variance. Here, we provide an alternative proof that relies on first principles similar to Lemma 5.3, because 
this approach is used for other designs presented later in this report. 

Using (A.5b), we have that ^ consistent estimator for Pncius,sp because 

P„cIusmr,sp — ^ U- )! p(\-p) = Ej (Y^ (1) - (0)) = P,„,^^sp ■ Eo obtain the asymptotic distribution 
/^ncius MR sp ’ expectation of the term inside the summation sign in (A.6) equals zero. 

Thus, a simple application of the central limit theorem shows that ^p asymptotically normal 

distribution with mean and variance: 

(A.9) AsyVarpi(^^^i^ — — —Epj{f- {,yi~ P q~ P ndus,spEi~'^il) )• 

np {\-p) 

Using the relation = 711^(1) + (1 — T’) 1^(0) and plugging into (A.9) the definitions for 
/^O “ PPtI ~ P)Pci ’ Pnonclus,SP ~ ^PtI ~ Pci) ’ 7 ~ obtain 5.27 after some algebra. 


Proof of Lemma 5.5 

For ease of presentation, we assume two subgroups (s — 2). The proof is identical for more subgroups, but 
the notation becomes more cumbersome. Let z. ={Gj{ri G.^fi G,., G.j Xj) be the model explanatory 

variables with the associated parameter vector (/?j <5j dj y). Note that /?j = P„ci„s\mrsp i® ATE 

parameter for subgroup 1 (for example, girls) and is the ATE parameter for subgroup 2 (for example, 
boys). The multiple regression estimator can then be expressed as 
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= (X z:z, / nr CL ' «) ^^[Eri (z;z, )]-' E,,, (z; j;, ) , 

z=l z=l 

V I J 

where Zj —{Gi\Ti G-jC G--^ G -2 Xj ). The matrix Ej^j(z[Zj) is block diagonal. To see this, we can examine 
each of the off-diagonal terms in turn: 

1 . Ej^i (G,G,,71^) = 0 because individuals can be in one subgroup level only. 


(d.lO) 


A 

A 


'Vf I 


2.E,rGnGnT,) = E,AGnC)-p(l-p)E,rGnlC=l)-p(l-p)E2^(GrC^0) 

= p(l-p)[gi-gj = 0 


and similarly for £'^^(G-2G,-2^)- 

2>.E,rGaGaC)-^ 

4- (GJ-r ) = pC- P)gxEi,, ir I <^,1 =^Ji=^)-pC- P)gxEiu (t, I = 1, 7 ; = 0) = 0 

and similarly for Ej^j{G^2TiXr 

Note also that Ej^AGafif ^ p{\~ p)q^ and Ej^iiGaCf ^ PC~ P)gi- 


In what follows, we add the variables G.^ and G -2 into the Xj vector because they could be correlated, and 

label this vector as X. = (G-^ G -2 Xj) with associated parameter vector y = (<^j S 2 j ) . We find then that 
(A. 10) can be expressed as follows: 



A 


p(l-p)g, 0 0 

-1 

ERiiGaCyC 

(d.ll) 

A 


0 p{\-p)q2 0 


EjuiGaCyC 


/V * 

7 


0 0 A,,, 


ErAxIV,) 


V J 


Z tl 

,_i ) • Solving further, we have that 

(A\2a) r^^[pG-p)gJ'E 2 ^(G,ry,) 

= [p{\- p)qi ]"‘ [p{\ - p)qiEj,j (y. \ =\,T. =1)- p(l- p)q^E,,j (y, | G.j = 1, 71 = 0)] 

“ Pn ~ Pc\ ~ Pnclus,\,SP 
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{A\lb) ^2 - P)qiT'E^ {GJ.y,) = //^2 - Pci = PnclusXSP 

{A\ 2 d) 

where = E, xfa,.) . 


Note that (A. 12a) and (A. 12b) prove that the multiple regression estimators and P 2 are consistent for 
the subgroup ATE parameters. The same proof applies to the general situation with 5 subgroups. 

To obtain the asymptotic distribution of for subgroup ^ for the general case with 5 subgroups, we apply 
a standard asymptotic expansion to (A. 10): 


(A13) = 


•Jnp[l-p)q, j; 






The expectation of the term inside the first summation sign equals zero. Thus, a simple application of the 
central limit theorem shows that has an asymptotically normal distribution with mean P^dus g sp 
variance: 


(A.U) AsyVar,(P^)= ^ (f ^G„ fa - Z )^ ) . 

Using the relation )lj = T.Y.(\) + (\ — T.)Y.(0) and plugging into (A. 14) the definitions for and y in 
(A.12a-c), we obtain 5.36 after some algebra. 


Proof of Lemma 5.6 

To consider the asymptotic moments of Pndus,SP,w > we use weighted OLS methods to estimate (5.16) using 

the weights W- . Let 71^ — ^ “ Pw the centered treatment status variable, where is the weighted 
treatment group sampling rate. Letting Z; = (1 T^, ), the weighted least squares estimator is 


(A 15) 


POA 


P, 




\r^ nclus ,SP PV J 


' EpR,wp 

0 

-1 

E ,{ R , w,yP 

0 

EPR . w , pi )_ 


Ei ( R , w - f .^ yP _ 


Thus, using Assumptions (5.2), we have that 
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(A.16a) ^E^{RlW^y^) 

= Ejiw^r\p^E,iw^Y,i\)) + i\-p^)E,iw^Y,^^^^^ 

{A.ieb) ^ 

= [E, (tv. )p^ (1 - p^ )] ‘ E, {W^Y, (1) - Y. {0)))p^ {^-Pw)= Pnclus,SP^ 

which proves that Pndus,SP,W is consistent. 

As in previous lemmas, we can obtain the asymptotic distribution of P„clus,SP W by applying an asymptotic 
expansion to (A. 16b): 

(^.17) 'Jp^{P„clus,SP,W ~ Pndus,SP^ ~ /“ ~ ~ ~ ~ ^^^iPwiyi ~ Mo ~ P ndus ,SpE iW^ (^) • 

Pnp^{\-p^)Epwp ,-.i 

The expectation of the term inside the first summation sign equals zero. Thus, using the central limit theorem, 
we find that Pndus,SP,W has an asymptotically normal distribution with mean and variance: 

(A18) AsyVarj^iP„^,^^ sp) = —. T ^ ~ Eo ~ Pndus,spPwf) 

n[p^v(^-Pn')E(wp] 
niPn'i^- Piv)E{wP\ 

We find that (5.47) follows after some algebra. 


Proof of Lemma 6.2 

The consistency of P„cius blocked pate '^as established in (6.23) of the main text. To obtain the asymptotic 
variance of sequentially use the law of total variance using the same conditioning approach 

that was used to establish the consistency of the ATE estimator. First, conditioning on the student and block 

h 

samples and replacing by the asymptotic approximation hE^{wP )^ , we have from the FP model 

b=\ 

in (6.4) that as the number of blocks gets large 


(A.\9) VarpiP, 


R nclus .blocked .PATE 




'Cb 


j 


hEsiwJ i,=i n^Ph n^0--Pb) « 


Second, using the law of total variance and averaging over the sampling of students from each block, we 
have that: 
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(A20) 


blocked , PATE 
h 




blocked , PATE 




nclus ^blocked ,PA TE 


)) 


1 h —2. 

{ + 


' Cb 


rb 


hEeiwP) t=i n^Ph n^(\-Pb) hE^{wP) 




Because — 1^^) / , (A. 20) reduces to 


(^.21) VarMcius 


nclus .blocked , PA TE 




'Ch 


hEsiWt) b=i n,p, n,{\-Ph) 


]• 


Finally, if we again apply the law of total variance by averaging over the sampling of h blocks, the expression 
in (6.24) follows. Asymptotic normality follows from the central limit theorem. 

The proof of (6.25) involves some tedious algebra, which we outline here. To simplify the notation, let 
h = hncius,b,PATE ^6 the ATE estimator in block h and let 7 ^ = ^dus, blocked , pate be the pooled ATE 
estimator. Our goal is to calculate 


(T.22) 


1 


(h-\')hE,(wA‘ t; 




We focus on the numerator term, which after expanding the squared term can be expressed as follows: 

(A23) E,[hE,Mll)-hw^E,,{T^)-\. 

Note that 


{A2A) hw^E„ Hi ) = (Z K'l + Z Z 


b=\ 


b=\ b'^b 


= E,,{wlll) + {h-\)E,,iw,w,IJ,). 


Thus, if we plug (A.24) into (A.23), we can express (A.23) as 


{A25) {h-\)E,{EMll)-E„{w,w,IJ,)\ . 


We now examine each of the terms inside the brackets. The first term is 

(A26) = 


nh 


^Cb 




^Tb ^Cb 


which follows by expanding the squared term, using the law of iterated expectations, and using the relations 
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EATJ,.) = E^{T,{l-W) = , and so on for / ^ /' . 


- 1 ) 




Using similar methods, we can show that the second term in the brackets in (A.25) is 

{All) E^{Wf,w,,IJy) = w^w^,{^n -Ecb)iEn' -Ecb’) ■ 

Gathering terms in (A.26) and (A.27), we can then express (A.25) as follows: 


^2 ^2 

(A28) {h-\)E,X^l{-^ + ^ + {iUn-i^cbf)-w^Wy{iUj^ -Ecb)iEn' -Ea')'\- 


^Tb ^Cb 


Because samples across blocks are independent, we have that 


EBi^h^b'^En Ecb){En' Ea')) ~ Eg{Wb{^rb Ea)) ■ 


Thus, (A. 28) becomes 


2 2 

(A.29) (h-i)E,lwll^+^] + rar,lw,(M„,-fia)]]^ 


^Tb ^Cb 


Inserting (A.29) into (A.22) shows that 


(T.30) 


1 


{h-l)hEg{wi^) 




2 2 
.,2r'-’ri I ^Cb 


hEsiw,) 


^bK[ + —] + Var^{w^{idj^- /dcB)M 


^Tb ^Cb 


which establishes (6.24). 


Proof of Lemma 7.1 

It is convenient to use the centered dependent variable, yj = yj — y ^, , in (7.9) and to exclude the intercept. 
The weighted least squares estimator for pp is 

m m 

y=i y=i 

Using the relation in (7.9) that yj = TjYj (1) + (1 — Ej)Yj (0) , we find that as tfl : 
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(A.32) 


E,pE^{WjT.yp^ E,pE^{w.TpTYpi) + {l-TpY.{m 


dus,FP 


EFp{Wj)p{l-p) 


J J J 

Eppi^Ppil- p) 


j' 


E,pwpYpi)-Ypmp(\-p) 

Efp{Wj)p{1-p) 


P clus , 


FPa ’ 


which proves that the weighted least squares estimator is consistent for the asymptotic FP parameter. 
To obtain the asymptotic distribution of pp , we apply a standard asymptotic expansion to (A.31): 


(A.33) 


clus,FP 


Pclus^FPa ^ 


~/mE,,p{w,)p{l-p} 


- A»,FTy))l + «,(l) . 


M 


Using the relation in (7.9) and the definition for pp (A-.31), we can express the right hand side of 

(A.33) as follows 


(A.34) 


~JmE„(wi)p(l- p) 


M 


(i))+(i-r.)(7.(0)-7^(0))}]+o ri). 


Using definitions for (X ■ and T- from (7.9), this expression can be simplified as 


1 m 

(A.35) -Epp['^Wjr/aj+(l-2p)Tj)] + o/l). 

y/mEpp(Wj)p(l-p) y=i 

Let I j — {cCj + (1 — 2 p^T ! ) and note that ^ /^ = 0 . Thus, the asymptotic variance of pp is: 


Epp[Varp ( J] WjTjlj )] p{\- p)Epp [( J] w]l 


.2/2 

J 


1 


AsyVarpCPdu..Fp') = 


i=i 








i=l J *J 


[mEppiwApil-p)] 


[mEppiwApil-p)] 


where the last equality holds because Varp(T-) — p(\ — p) and CoVp{T-T-.) — —p{\ — p) /{m — V) .Because 

^ Wjlj = 0 , it follows that ~ ^ thus, — ^ ^ ^ • Flence, 

./■ ./ J J'^j ./■ 
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(A.36) AsyVarj,(/^^, pp) = 


7=1 


Epp(Wj) p{\- p)m{m-\) 

m 

£ E,,[w] {(1 - p){Y^ (1) - (1)) + p{Y^ (0) - (0))}^ ] 


7=1 


Epp{w^) p{\- p)m{m-\) 


1 

Epp{Wjf 


^2 ^2 
^TIV I ^CIV 

mp m(\-p) 



m 


The asymptotic normality of pp follows from (A.35) using a central limit theorem for finite populations 
(see for example, Freedman 2006, Hogland 1978, and Hajek 1960). 


Proof of Lemma 7.2 

It is convenient to use the centered dependent variable y j = yj — y^, in the regression model and to exclude 
the intercept. The weighted multiple regression estimator for the parameter vector is 



{h \ 

HcIusME,FP,W 


^ wfj / m 

/ w T il. ! m 

.1 .1 .1 

-1 

m 

Y^wT^jlm 

(T.37) 

=[(£f;w/i.)-‘£i;w.;.] = 

y=i 



7=1 


V y ) 

y=i y-i 

^ wTx.^ / m 

V x'w X. / m 

J / J 


JtlH’/y./m 




_ M 

y=i 


_ 7=1 


Because of random assignment, = 0 . Thus, converges to a 

7=1 7=1 

block diagonal matrix as m approaches infinity, and we find that 


(^.38a) 


A 


clusMR.FP.W 


m 

> — 

m 

E.AE.J.wp] 

7=1 


EppjEpiWjfjy.)] _ 

Epp{w.)p{\-p) 


(d.38Z?) 7 = (Xxlw^.x. /m)^'£(x'(;^o + ^7 ))/'”■ 

7=1 7=1 


'^XWx'^XWn “ Y • 


Thus, (A.38a) proves that pp ^ is consistent. To obtain the asymptotic distribution of mrfpw ^ 

we apply a standard asymptotic expansion to (A.37): 
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(A39) =-/= ^ -Xj7))] + o,(l) . 

ylmE^p{w^)p{l- p) 


Using the proof of Lemma 7.1, we can express (A.39) as 


1 m 

(AAO) -EpplY^WjTjiUj +{l-2p)Tj -ijY)] + o/l). 

^mEpp(Wj)p(\-p) j=i 


Defining 1- = {cc ■ + (1 — 2 p)Vj — XjY) , the remainder of the proof follows using the same methods as for 
Lemma 7.1. 


Proof of Lemma 7.3 

The consistency of p^j.^ was established in (7.28) of the main text and asymptotic normality follows using 

a standard central limit theorem. To obtain the asymptotic variance of p^pp , we sequentially use the law 

of total variance using the same conditioning approach that was used to establish consistency. First, averaging 
over the super-population of students within the study schools conditional on the school samples and 
treatment assignments yields 

1 , 1 

./=i 7=1 

Second, using the law of total variance and averaging over the randomization distribution, we have 
asymptotically that 

(^.42) 

,pate) ,PATE )) + Varp{EALs .PATpy) 

"^Tj ^ (l-/>) ^7<^Cv 

7=1 7=1 


P 




E 

2 j=\ 


2 

WjC 


TW 


+ - 


CW 


tW 


mp\P m(l - p)w^ mw^ 


(^.41) VarAPcius,PATE) = 




7=1 


where ^ ikPij ~ Ptw ) ~ (/^q ~ Pew ))^ ~ 1) • 4'he second bracketed term is the between- 

school variance component and follows using results for the clustered FP model. 

m m 

In (A.42), we have that E T .f = p^ E ^iw jf and Epp(^w^A~Tj)f =m^{\- pf EAWjf . 

7=1 7=1 

Thus, the within-school variance components are 0^(1/ whereas the between-school variance terms 

are 0^(1/ m) . Thus, our first-order asymptotic approximation excludes the within-school variance terms. 
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Finally, (7.29) follows if we average over samples from the school super-population and use the relation 

(^. 43 ) Var^^ ^PcIus,pate ) ~ {^cIus.pate )) ■*" Var^ i^jp {^cIus,pate )) ’ 

where 

,PATEy) I mE^iWjf . 
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